【问题标题】:Feature Selection Scikit Learn特征选择 Scikit Learn
【发布时间】:2018-02-24 18:20:56
【问题描述】:
在 scikit-learn 中运行特征选择后,我想公开相关变量,向我展示从方法中选择的变量,这怎么可能?命令X.shape只是显示变量的个数,我想看特征选择后的变量名称。
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
iris = load_iris()
X, y = iris.data, iris.target
X.shape
X_new = SelectKBest(chi2, k=2).fit_transform(X, y)
X_new.shape
【问题讨论】:
标签:
python
machine-learning
scikit-learn
feature-selection
【解决方案1】:
在特征选择之后,如果您只想选择那些被选为重要(“真”)的特征来构建新模型,您可以执行以下操作:
feats = X.T.tolist()
optimised_feats = []
for i,j in zip(X_new.support_,feats):
if i == True:
optimised_feats.append(j)
optimised_feats=np.array(optimised_feats).T
【解决方案2】:
您可以获取名称,但您需要使用 pandas 并将 numpy 转换为数据帧。
示例:
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
import pandas as pd
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.DataFrame(iris.target)
selector = SelectKBest(chi2, k=2)
selector.fit(X, y)
X_new = selector.transform(X)
X_new.shape
#text format
X.columns[selector.get_support(indices=True)]
#vector format
vector_names = list(X.columns[selector.get_support(indices=True)])
print(vector_names)
#2nd way
X.columns[selector.get_support(indices=True)].tolist()
结果:
Index([u'petal length (cm)', u'petal width (cm)'], dtype='object')
['petal length (cm)', 'petal width (cm)']
['petal length (cm)', 'petal width (cm)']
【解决方案3】:
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
iris = load_iris()
X, y = iris.data, iris.target
X.shape
skb = SelectKBest(chi2, k=2)
skb.fit(X, y)
X_new = skb.transform(X)
X_new.shape
print skb.get_support(indices=True)
这将为您提供所选功能的索引。