在进行特征选择时跟踪特征名称答案

【问题标题】：Keeping track of feature names when doing Feature Selection在进行特征选择时跟踪特征名称
【发布时间】：2020-08-13 23:05:10
【问题描述】：

使用 sklearn 中的 feature_selection 函数进行特征选择时，有没有办法跟踪实际特征名称而不是默认的“f1”、“f2”等...？我有大量的功能，所以我无法手动跟踪。显然，我可以编写代码来执行此操作，但我想知道是否可以设置一些简单的选项。

【问题讨论】：

这能回答你的问题吗？ is there away to output selected columns names from SelectFromModel method?

标签： machine-learning scikit-learn feature-selection

【解决方案1】：

如果你有一个 pandas 数据框，你可以返回函数选择的列的名称，你只需要使用get_support 方法。

这里有一个简单的例子，来自官方documentation 的一些修改。

import pandas as pd
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import LogisticRegression
X = [[ 0.87, -1.34,  0.31, 0],
     [-2.79, -0.02, -0.85, 1],
     [-1.34, -0.48, -2.55, 0],
     [ 1.92,  1.48,  0.65, 1]]

df = pd.DataFrame(X, columns=['col1', 'col2', 'col3', 'label'])
train_x = df.loc[:, ['col1',  'col2', 'col3']]
y = df.label
selector = SelectFromModel(estimator=LogisticRegression()).fit(train_x, y)

col_index = selector.get_support()
print(train_x.columns[col_index])
# output print --> Index(['col2'], dtype='object')

【讨论】：

当您打印时，我认为您应该这样做print(train_x.columns[col_index])