【发布时间】:2021-08-29 09:03:08
【问题描述】:
在执行超参数调整并为我的分类器获得最佳参数后,我试图从我的数据中获取特征重要性。我还为训练集拟合了我最好的参数,现在我正在尝试获取重要的功能,但我不断收到错误,并尝试了我在互联网上找到的所有可能的解决方案。
在下面查看我的代码;
enter code here
# define models and parameters for hyperparametrs
from sklearn.experimental import enable_halving_search_cv
from sklearn.model_selection import HalvingGridSearchCV
# define grid search
from sklearn.model_selection import GridSearchCV
# Create the parameter grid based on the results of random search
param_grid = {
'bootstrap': [True],
'max_features': ['auto','sqrt'],
'n_estimators': [100,1000]
}
# Create a based model
rf = RandomForestClassifier()
# Instantiate the grid search model
grid_search = HalvingGridSearchCV(estimator = rf, param_grid = param_grid,
cv = 3, n_jobs = -1, verbose = 2)
cv = StratifiedKFold(n_splits=10, shuffle = True, random_state=42)
steps_3 = [('over', RandomOverSampler()), ('chi_square', SelectKBest(chi2, k=7000)), ('estimator', grid_search)]
pipeline_3 = Pipeline(steps=steps_3)
#fit the model
rf_hyperparameter = pipeline_3.fit(X_train, y_train)
print(rf_hyperparameter)
# print('Best parameter set: %s' % grid_search.best_params_)
print("Best Score:" + str(grid_search.best_score_))
print("Best Parameters: " + str(grid_search.best_params_))
best_parameters = grid_search.best_params_
#fit the best parameters to the training data
rf_best = RandomForestClassifier(bootstrap = True, max_features= 'auto', n_estimators = 1000)
rf_best.fit(X_train, y_train)
feature_importances = pd.DataFrame(rf_best.feature_importances_,
index=X_train.columns,columns=['importance']).sort_values('importance',ascending = False)
feature_importances
运行上述代码后,这是我得到的错误
AttributeError Traceback (most recent call last)
<ipython-input-159-563c7c3e7fc5> in <module>
1 feature_importances = pd.DataFrame(rf_best.feature_importances_,
----> 2 index=X_train.columns,columns=['importance']).sort_values('importance',ascending = False)
3 feature_importances
AttributeError: 'numpy.ndarray' object has no attribute 'columns'
我将非常感谢我能得到的任何意见。谢谢!
【问题讨论】:
-
train_test_split完成的部分代码不见了,能否补充一下 -
是的。请参阅此处的部分。 #Split Train and Test Data from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=42,stratify=Y)
标签: python scikit-learn data-science random-forest numpy-ndarray