【发布时间】:2016-08-09 13:21:21
【问题描述】:
我正在尝试使用 scikit-learn 在管道中链接网格搜索和递归特征消除。
带有“裸”分类器的 GridSearchCV 和 RFE 工作正常:
from sklearn.datasets import make_friedman1
from sklearn import feature_selection
from sklearn.grid_search import GridSearchCV
from sklearn.svm import SVR
X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
est = SVR(kernel="linear")
selector = feature_selection.RFE(est)
param_grid = dict(estimator__C=[0.1, 1, 10])
clf = GridSearchCV(selector, param_grid=param_grid, cv=10)
clf.fit(X, y)
将分类器放入管道会返回错误:RuntimeError: The classifier does not expose "coef_" or "feature_importances_" attributes
from sklearn.datasets import make_friedman1
from sklearn import feature_selection
from sklearn import preprocessing
from sklearn import pipeline
from sklearn.grid_search import GridSearchCV
from sklearn.svm import SVR
X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
est = SVR(kernel="linear")
std_scaler = preprocessing.StandardScaler()
pipe_params = [('std_scaler', std_scaler), ('clf', est)]
pipe = pipeline.Pipeline(pipe_params)
selector = feature_selection.RFE(pipe)
param_grid = dict(estimator__clf__C=[0.1, 1, 10])
clf = GridSearchCV(selector, param_grid=param_grid, cv=10)
clf.fit(X, y)
编辑:
我意识到我没有清楚地描述问题。这是更清晰的sn-p:
from sklearn.datasets import make_friedman1
from sklearn import feature_selection
from sklearn import pipeline
from sklearn.grid_search import GridSearchCV
from sklearn.svm import SVR
X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
# This will work
est = SVR(kernel="linear")
selector = feature_selection.RFE(est)
clf = GridSearchCV(selector, param_grid={'estimator__C': [1, 10]})
clf.fit(X, y)
# This will not work
est = pipeline.make_pipeline(SVR(kernel="linear"))
selector = feature_selection.RFE(est)
clf = GridSearchCV(selector, param_grid={'estimator__svr__C': [1, 10]})
clf.fit(X, y)
如您所见,唯一的区别是将估算器放入管道中。然而,管道隐藏了“coef_”或“feature_importances_”属性。问题是:
- 在 scikit-learn 中有没有很好的方法来处理这个问题?
- 如果不是,是否出于某种原因需要这种行为?
EDIT2:
根据@Chris 提供的答案更新,工作 sn-p
from sklearn.datasets import make_friedman1
from sklearn import feature_selection
from sklearn import pipeline
from sklearn.grid_search import GridSearchCV
from sklearn.svm import SVR
class MyPipe(pipeline.Pipeline):
def fit(self, X, y=None, **fit_params):
"""Calls last elements .coef_ method.
Based on the sourcecode for decision_function(X).
Link: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/pipeline.py
----------
"""
super(MyPipe, self).fit(X, y, **fit_params)
self.coef_ = self.steps[-1][-1].coef_
return self
X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
# Without Pipeline
est = SVR(kernel="linear")
selector = feature_selection.RFE(est)
clf = GridSearchCV(selector, param_grid={'estimator__C': [1, 10, 100]})
clf.fit(X, y)
print(clf.grid_scores_)
# With Pipeline
est = MyPipe([('svr', SVR(kernel="linear"))])
selector = feature_selection.RFE(est)
clf = GridSearchCV(selector, param_grid={'estimator__svr__C': [1, 10, 100]})
clf.fit(X, y)
print(clf.grid_scores_)
【问题讨论】:
-
我会反省源代码以检查导致 RuntimeError 的事件链。您很有可能能够覆盖相关返回对象的属性并简单地添加回变量 - 例如,如果从 SVR() 返回时它们相同。无论如何,make_pipeline() 可能不会返回与 SVR() 相同类型的对象。
标签: python scikit-learn