根据 feature_importances_ 减少特征数量

【问题标题】：Reduce the number of features based on feature_importances_根据 feature_importances_ 减少特征数量
【发布时间】：2016-05-17 06:55:31
【问题描述】：

我已经建立了一个 scikit learn 随机森林分类器模型，并且想根据 feature_importances 减少特征的数量_

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(5000)

model.fit(data[train], target[train])

model.feature_importances_

如何对模型进行子集化，使其仅包含通过 feature_importances_ 识别的特征（比如前 5 个特征）。

【问题讨论】：

标签： python pandas scikit-learn

【解决方案1】：

如果您使用的是 0.17，则可以使用 SelectFromModel

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import SelectFromModel

iris = datasets.load_iris()
X, y = iris.data, iris.target
model = RandomForestClassifier(5000)

new_model = SelectFromModel(model, threshold=0.5)

从this example 开始，有一些方法可以调整threshold 参数。

【讨论】：