Sklearn GridSearchCV，class_weight 因未知原因无法正常工作:(答案

【问题标题】：Sklearn GridSearchCV, class_weight not working for unknown reason :(Sklearn GridSearchCV，class_weight 因未知原因无法正常工作:(
【发布时间】：2015-10-28 00:53:39
【问题描述】：

试图让class_weight 去。我知道其余的代码有效，只是 class_weight 给了我错误：

    parameters_to_tune = ['min_samples_split':[2,4,6,10,15,25], 'min_samples_leaf':[1,2,4,10],'max_depth':[None,4,10,15],
                                             ^
SyntaxError: invalid syntax

这是我的代码

clf1 = tree.DecisionTreeClassifier()
 parameters_to_tune = ['min_samples_split':[2,4,6,10,15,25], 'min_samples_leaf':[1,2,4,10],'max_depth':[None,4,10,15],
 'splitter' : ('best','random'),'max_features':[None,2,4,6,8,10,12,14],'class_weight':{1:10}]
clf=grid_search.GridSearchCV(clf1,parameters_to_tune)
clf.fit(features,labels)
print clf.best_params_

有人发现我犯的错误吗？

【问题讨论】：

你能举例说明你的特征和标签是什么样的吗？
features 基本上是一个数字数组（浮点数），其中作为标签，是（不知道您是否也称其为数组或简单的向量）[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0.....
parameters_to_tune 应该是一个字典或字典列表。您的初始语法是正确的。您只需要更改字典中的 'class_weight' 键值对。（抱歉，我刚才没有看到您的更新，但您最好保留您的原始帖子并附加您的更新，否则人们不会知道原始问题。）
而你的class_weight 应该是一个字典列表，你又犯了错误......

标签： python scikit-learn grid-search

【解决方案1】：

~~我假设您想在不同的class_weight 上对“薪水”类进行网格搜索。~~

class_weight 的值应该是一个列表：

'class_weight':[{'salary':1}, {'salary':2}, {'salary':4}, {'salary':6}, {'salary':10}]

你可以用列表理解来简化它：

'class_weight':[{'salary': w} for w in [1, 2, 4, 6, 10]]

第一个问题是dictparameters_to_tune中的参数值应该是一个列表，而你传递的是一个dict。可以通过将字典列表作为class_weight 的值传递来修复它，每个字典包含一组class_weight 用于DecisionTreeClassifier。

但更严重的问题是class_weight 是与类相关的权重，但在您的情况下，'salary' 是功能的名称。您不能为特征分配权重。起初我误解了你的意图，但现在我对你想要什么感到困惑。

class_weight 的形式是{class_label: weight}，如果你真的想在你的情况下设置class_weight，class_label 应该是 0.0、1.0 等值，语法如下：

'class_weight':[{0: w} for w in [1, 2, 4, 6, 10]]

如果一个类的权重很大，分类器更有可能预测数据属于该类。使用class_weight 的一种典型情况是数据不平衡时。

这里是example，虽然分类器是 SVM。

更新：

完整的parameters_to_tune 应该是这样的：

parameters_to_tune = {'min_samples_split': [2, 4, 6, 10, 15, 25],
                      'min_samples_leaf': [1, 2, 4, 10],
                      'max_depth': [None, 4, 10, 15],
                      'splitter' : ('best', 'random'),
                      'max_features':[None, 2, 4, 6, 8, 10, 12, 14],
                      'class_weight':[{0: w} for w in [1, 2, 4, 6, 10]]}

【讨论】：

谢谢，这看起来很不错，但不幸的是，当我尝试两者时，我总是出现错误：ValueError: Invalid parameter class_weight for estimator DecisionTreeClassifier
DecisionTreeClassifier 在 scikit-learn 0.16 之前没有 class_weight。考虑到新错误，您可能没有正确从 0.15 升级到 0.16。（见stackoverflow.com/questions/29596237/…）
非常感谢。它必须是这样的。我现在用 shell 用“conda install....”安装它，不幸的是它越来越神秘pastebin.com/SuQVbuBu :( 想放弃
我刚刚这样做并得到与以前相同的错误:(
感谢您的更新，现在我很困惑 :( 我认为 class_weight 可以为我提供如何衡量分类器中不同特征的因素，否则我根本不明白它的目的 :(你能不能给我一个例子，如果不是为了特征，重量是合适的？

【解决方案2】：

下面的链接是关于不同 class_weight 值的使用。只需 Ctrl+F "class_weight" 到相关部分。它使用GridSearchCV 为不同的优化目标找到最佳的class_weight。

Optimizing a classifier using different evaluation metrics

【讨论】：