正如here 解释的那样,通过树定义的特征重要性并不是一个很好的指标。如果您负担得起计算时间,则最好使用排列特征重要性。
ELI5 有一个implementation。为了比较,您可以运行以下代码来检查您训练的模型 clf。
from eli5.sklearn import PermutationImportance
iterations = 5
#http://scikit-learn.org/stable/modules/model_evaluation.html#common-cases-predefined-values
eval_metric = 'r2'
#eval_metric = 'neg_mean_absolute_error'
#eval_metric = 'neg_mean_squared_error'
#eval_metric = 'explained_variance'
perm_train = PermutationImportance(clf,scoring = eval_metric, n_iter=iterations).fit(X_train, y_train)
feature_importance_train = perm_train.feature_importances_
feature_importance_train_error = perm_train.feature_importances_std_/np.sqrt(iterations)
perm_test = PermutationImportance(clf,scoring = eval_metric, n_iter=iterations).fit(X_test, y_test)
feature_importance_test = perm_test.feature_importances_
feature_importance_test_error = perm_test.feature_importances_std_/np.sqrt(iterations)
# make model importances relative to max importance
feature_importance_model = clf.feature_importances_
feature_importance_model = feature_importance_train.max() * (feature_importance_model / feature_importance_model.max())
sorted_idx = np.argsort(feature_importance_model)
pos = np.arange(sorted_idx.shape[0]) + .5
featfig = plt.figure(figsize=(6, 15))
featfig.suptitle('Feature Importance')
featax = featfig.add_subplot(1, 1, 1)
featax.errorbar(x=feature_importance_train[sorted_idx], y=pos, xerr = feature_importance_train_error[sorted_idx], linestyle='none', marker='.', label = 'Train')
featax.errorbar(x=feature_importance_test[sorted_idx], y=pos, xerr = feature_importance_test_error[sorted_idx],linestyle='none', marker='.', label = 'Test')
featax.errorbar(x=feature_importance_model[sorted_idx], y=pos, linestyle='none', marker='.', label = 'Model')
featax.set_yticks(pos)
featax.set_yticklabels(np.array(features)[sorted_idx], fontsize=8)
featax.set_xlabel(eval_metric + ' change')
featlgd = featax.legend(loc=0)
由于您可以选择评估指标,因此您可以选择对异常值或多或少敏感的指标。