class_weights 如何应用于 sklearn 逻辑回归？答案

【问题标题】：How are class_weights being applied in sklearn logistic regression?class_weights 如何应用于 sklearn 逻辑回归？
【发布时间】：2018-10-30 04:22:51
【问题描述】：

我对 sklearn 如何应用我们提供的类权重感兴趣。 documentation 没有明确说明类权重的应用位置和方式。阅读源代码也没有帮助（似乎 sklearn.svm.liblinear 用于优化，我无法阅读源代码，因为它是一个 .pyd 文件......）

但我猜它适用于成本函数：当指定类权重时，各个类的成本将乘以类权重。例如，如果我分别有 0 类（权重=0.5）和 1 类（权重=1）的 2 个观察值，那么成本函数将是：

成本 = 0.5*log(...X_0,y_0...) + 1*log(...X_1,y_1...) + 惩罚

有谁知道这是否正确？

【问题讨论】：

标签： python scikit-learn logistic-regression

【解决方案1】：

查看the following lines in the source code:

le = LabelEncoder()
if isinstance(class_weight, dict) or multi_class == 'multinomial':
    class_weight_ = compute_class_weight(class_weight, classes, y)
    sample_weight *= class_weight_[le.fit_transform(y)]

Here is the source code for the compute_class_weight() function:

...
else:
    # user-defined dictionary
    weight = np.ones(classes.shape[0], dtype=np.float64, order='C')
    if not isinstance(class_weight, dict):
        raise ValueError("class_weight must be dict, 'balanced', or None,"
                         " got: %r" % class_weight)
    for c in class_weight:
        i = np.searchsorted(classes, c)
        if i >= len(classes) or classes[i] != c:
            raise ValueError("Class label {} not present.".format(c))
        else:
            weight[i] = class_weight[c]
...

在上面的sn-p中，class_weight被应用到sample_weight，在_logistic_loss_and_grad、_logistic_loss等一些内部函数中使用：

# Logistic loss is the negative of the log of the logistic function.
out = -np.sum(sample_weight * log_logistic(yz)) + .5 * alpha * np.dot(w, w)
# NOTE: --->  ^^^^^^^^^^^^^^^

【讨论】：