【发布时间】:2020-10-07 22:23:07
【问题描述】:
我正在使用 sklearn 制作 DataFrame 预处理管道并链接各种类型的预处理步骤。
我想链接 SimpleImputer 变压器和 FunctionTransformer 应用 pd.qcut(或 pd.cut),但我不断收到以下错误:
ValueError: 输入数组必须是一维的
这是我的代码:
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import FunctionTransformer
class FeatureSelector(BaseEstimator, TransformerMixin):
def __init__(self, features):
self._features = features
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
return X[self._features]
fare_transformer = Pipeline([
('fare_selector', FeatureSelector(['Fare'])),
('fare_imputer', SimpleImputer(strategy='median')),
('fare_bands', FunctionTransformer(func=pd.qcut, kw_args={'q': 5}))
])
如果我简单地将FeatureSelector 转换器和FunctionTransformer 与pd.qcut 链接并省略SimpleImputer,也会发生同样的情况:
fare_transformer = Pipeline([
('fare_selector', FeatureSelector(['Fare'])),
('fare_bands', FunctionTransformer(func=pd.qcut, kw_args={'q': 5}))
])
我广泛搜索了 stackoverflow 和 google,但找不到解决此问题的方法。在这里的任何帮助将不胜感激!
【问题讨论】:
标签: python pandas scikit-learn pipeline