svm scikit learn中的class weight = none和auto有什么区别答案

【问题标题】：what is the difference between class weight = none and auto in svm scikit learnsvm scikit learn中的class weight = none和auto有什么区别
【发布时间】：2015-04-21 16:35:50
【问题描述】：

在 scikit learn svm 分类器中，class_weight = None 和 class_weight = Auto 有什么区别。

从文档中它被给出为

对于 SVC，将第 i 类的参数 C 设置为 class_weight[i]*C。如果没有给出，所有的类都应该有一个权重。 “自动”模式使用 y 的值自动调整与类频率成反比的权重。

class sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma=0.0, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, random_state=None)

但是使用自动模式有什么好处。我无法理解它的实现。

【问题讨论】：

标签： python machine-learning scikit-learn

【解决方案1】：

这发生在class_weight.py file:

elif class_weight == 'auto':
    # Find the weight of each class as present in y.
    le = LabelEncoder()
    y_ind = le.fit_transform(y)
    if not all(np.in1d(classes, le.classes_)):
        raise ValueError("classes should have valid labels that are in y")

    # inversely proportional to the number of samples in the class
    recip_freq = 1. / bincount(y_ind)
    weight = recip_freq[le.transform(classes)] / np.mean(recip_freq)

这意味着您拥有的每个类（在classes 中）的权重等于1 除以该类在您的数据中出现的次数（y），因此出现频率更高的类将获得较低的重量。然后将其进一步除以所有逆类频率的平均值。

优点是您不必再担心自己设置类权重：这对大多数应用程序来说应该已经很好了。

如果您查看上面的源代码，对于None，weight 填充了一个，因此每个类的权重相同。

【讨论】：

所以具体区别在于none不对类进行加权，而auto根据类的分布计算权重。
@JAB - 事实上，我已经澄清了这一点。谢谢！

【解决方案2】：

这是一篇相当老的帖子，但对于所有刚刚遇到此问题的人，请注意 class_weight == 'auto' 自 0.17 版起已被弃用。请改用 class_weight == 'balanced'。

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

这实现如下：

n_samples / (n_classes * np.bincount(y))

干杯！

【讨论】：