【发布时间】:2021-08-06 16:17:05
【问题描述】:
我想使用sklearn.feature_selection.SelectFromModel 来提取多步回归问题中的特征。回归问题使用MultiOutputRegressor 和RandomForestRegressor 组合来预测多个值。当我尝试使用SelectFromModel.get_support() 获取所选功能时,它会给出一个错误,表明我需要使一些feature_importances_ 可访问以使该方法正常工作。
可以访问feature_importances_ 或MultiOutputRegressor,如this question 所示。但是我不确定如何将这些 feature_importances_ 正确传递给 SelectFromModel 类。
这是我到目前为止所做的:
# make sample data
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
X, y = make_regression(n_samples=100, n_features=100, n_targets=5)
print(X.shape, y.shape)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, shuffle=True)
# get important features for prediction problem:
from sklearn.multioutput import MultiOutputRegressor
regr_multirf = MultiOutputRegressor(RandomForestRegressor(n_estimators = 100))
regr_multirf = regr_multirf.fit(X_train, y_train)
sel = SelectFromModel(regr_multirf, max_features= int(np.floor(X_train.shape[1] / 3.)))
sel.fit(X_train, y_train)
sel.get_support()
# to get feature_importances_ of Multioutputregressor use line:
regr_multirf.estimators_[1].feature_importances_
输出:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-72-a1d635ad4a34> in <module>()
5 sel = SelectFromModel(regr_multirf, max_features= int(np.floor(X_train.shape[1] / 3.)))
6 sel.fit(X_train, y_train)
----> 7 sel.get_support()
2 frames
/usr/local/lib/python3.7/dist-packages/sklearn/feature_selection/_from_model.py in _get_feature_importances(estimator, norm_order)
30 "`feature_importances_` attribute. Either pass a fitted estimator"
31 " to SelectFromModel or call fit before calling transform."
---> 32 % estimator.__class__.__name__)
33
34 return importances
ValueError: The underlying estimator MultiOutputRegressor has no `coef_` or `feature_importances_` attribute. Either pass a fitted estimator to SelectFromModel or call fit before calling transform.
任何帮助和提示将不胜感激。
【问题讨论】:
-
您的回答让我明白 MultiOutputRegressor 不会同时预测所有目标,这在我的任务中是必需的。因此,我现在在 SelectFromModel() 类中使用 DecisionTreeRegressor,这可以按预期工作。感谢您非常有帮助的回答。
标签: python scikit-learn