当一个分裂被完美分类时计算ID3算法中一个属性的熵答案

【问题标题】：Calculating the entropy of an attribute in the ID3 algorithm when a split is perfectly classified当一个分裂被完美分类时计算ID3算法中一个属性的熵
【发布时间】：2017-02-08 07:51:42
【问题描述】：

我最近一直在阅读有关 ID3 算法的信息，它说要选择用于拆分的最佳属性应该导致最大的信息增益，这可以借助熵来计算。

我编写了一个简单的 python 程序来计算熵。如下图所示：

def _E(p, n):
    x = (p/(p+n))
    y = (n/(p+n))
    return(-1* (x*math.log2(x)) -1* (y*math.log2(y)))

但是假设我们有一个由 10 个元素组成的表格，如下所示：

x = [1, 0, 1, 0, 0, 0, 0, 0, 0, 0]

y = [1, 1, 1, 0, 1, 0, 1, 0, 1, 0]

其中 x 是属性，y 是类。这里 P(0) = 0.8 和 P(1) = 0.2。熵如下：

熵(x) = 0.8*_E(5, 3) + 0.2*_E(2, 0)

但是，第二个拆分 P(1) 是完全分类的，这会导致数学错误，因为 log2(0) 是负无穷大。这种情况下的熵如何计算？

【问题讨论】：

你应该在stats.stackexchange.com问这个问题

标签： python machine-learning decision-tree

【解决方案1】：

拆分的熵衡量与该拆分中的类标签相关的不确定性。在二元分类问题（类别 = {0,1}）中，类别 1（在您的文本中，x）的概率可以在 0 到 1 之间。当 x=0.5 时，熵最大（值为 1）。在这里，这两个类别的可能性相同。当其中一个类不存在时，熵最小，即 x=0 或 x=1。在这里，类没有不确定性，因此熵为0。

熵（y 轴）与 x（x 轴）的关系图：

下面的计算展示了如何在数学上处理熵计算，当 x=0 时（x=1 的情况类似）：

在你的程序中，你可以把 x=0 和 x=1 当作特殊情况，返回 0。对于 x 的其他值，可以直接使用上式。

【讨论】：

谢谢！这就是我需要确信当 x = {0, 1} 时 B 实际上应该返回 0。

【解决方案2】：

熵是杂质的量度。因此，如果一个节点是纯的，则意味着熵为零。

看看this -

def information_gain(data, column, cut_point):
    """
    For calculating the goodness of a split. The difference of the entropy of parent and 
    the weighted entropy of children.
    :params:attribute_index, labels of the node t as `labels` and cut point as `cut_point`
    :returns: The net entropy of partition 
    """
    subset1, subset2 = divide_data(data, column, cut_point) 
    lensub1, lensub2 = len(subset1), len(subset2)  
    #if the node is pure return 0 entropy
    if len(subset1) == 0 or len(subset2) == 0:
        return (0, subset1, subset2)     
    weighted_ent = (len(subset1)*entropy(subset1) + len(subset2)*entropy(subset2)) / len(data)  
    return ((entropy(data) - weighted_ent), subset1, subset2)

【讨论】：