【问题标题】:Get feature names of ColumnTransformer using StandarScaler and One-Hot-Encoding使用 StandarScaler 和 One-Hot-Encoding 获取 ColumnTransformer 的特征名称
【发布时间】:2021-08-05 19:28:25
【问题描述】:

我正在使用带有 StandardScaler 和 OneHotEncoder 的简单 ColumnTransformer,例如:

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder

num_features = ['num_feat_1',
                'num_feat_2',
                'num_feat_3']
cat_features = ['cat_feat_1',
                'cat_feat_2',
                'cat_feat_3']

ct = ColumnTransformer([
    ("scaler", StandardScaler(), num_features),
    ("onehot", OneHotEncoder(sparse=False,
                             handle_unknown='ignore'), cat_features)], 
    remainder='passthrough') 

ct.fit(X_train)
X_train_trans = ct.transform(X_train)
X_test_trans = ct.transform(X_test)

要映射线性回归的系数,我需要 ct.get_feature_names(),但我收到错误 Transformer scaler (type StandardScaler) does not provide get_feature_names。为什么会这样,我该如何解决?

【问题讨论】:

    标签: python-3.x machine-learning scikit-learn


    【解决方案1】:

    在您的情况下,get_feature_names() 仅适用于 onehot ,而对于 StandardScaler() 您不会更改转换后变量的名称,因此我们通过转换器,如果 get_feature 不起作用,我们保留原始特征名称。

    使用示例数据集:

    import pandas as pd
    import numpy as np
    X = pd.concat([
        pd.DataFrame(np.random.uniform(0,1,(100,3)),columns=num_features),
        pd.DataFrame(np.random.choice(['a','b'],(100,3)),columns=cat_features)
    ],axis=1)
    
    X_train = X.iloc[:50,:]
    X_test = X.iloc[50:,:]
    
    from sklearn.compose import ColumnTransformer
    from sklearn.preprocessing import StandardScaler
    from sklearn.preprocessing import OneHotEncoder
    
    num_features = ['num_feat_1',
                    'num_feat_2',
                    'num_feat_3']
    cat_features = ['cat_feat_1',
                    'cat_feat_2',
                    'cat_feat_3']
    
    ct = ColumnTransformer([
        ("scaler", StandardScaler(), num_features),
        ("onehot", OneHotEncoder(sparse=False,
                                 handle_unknown='ignore'), cat_features)], 
        remainder='passthrough') 
    
    ct.fit(X_train)
    

    我们试试这个:

    tx = ct.get_params()['transformers']
    feature_names = []
    for name,transformer,features in tx:
        try:
            Var = ct.named_transformers_[name].get_feature_names().tolist()
        except AttributeError:
            Var = features
        feature_names = feature_names + Var
    
    feature_names
    ['num_feat_1',
     'num_feat_2',
     'num_feat_3',
     'x0_a',
     'x0_b',
     'x1_a',
     'x1_b',
     'x2_a',
     'x2_b']
    

    【讨论】:

    • 谢谢,工作正常。这背后的原因是什么?我会说,get_feature_names 应该尽可能方便,不是吗?
    • StandardScaler() 没有 get_feature_names() 方法,我猜它不是用来存储功能名称的
    猜你喜欢
    • 2017-04-23
    • 1970-01-01
    • 2016-03-02
    • 2020-10-25
    • 1970-01-01
    • 1970-01-01
    • 2019-07-05
    • 2021-04-12
    • 2021-04-15
    相关资源
    最近更新 更多