具有多个估计器的 Sklearn 管道答案

【问题标题】：Sklearn Pipeline with multiple estimators具有多个估计器的 Sklearn 管道
【发布时间】：2020-12-26 21:32:52
【问题描述】：

在链接估算器并尝试查看时遇到错误。我是 Python 新手，这是我第一次尝试这个管道功能。

from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn.decomposition import PCA

estimator=[('dim_reduction',PCA()),('logres_model',LogisticRegression()),('linear_model',LinearRegression())]

pipeline_estimator=Pipeline(estimator)

错误信息

TypeError                                 Traceback (most recent call last)
<ipython-input-196-44549764413a> in <module>
----> 1 pipeline_estimator=Pipeline(estimator)

D:\Anaconda\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     71                           FutureWarning)
     72         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73         return f(**kwargs)
     74     return inner_f
     75 

D:\Anaconda\lib\site-packages\sklearn\pipeline.py in __init__(self, steps, memory, verbose)
    112         self.memory = memory
    113         self.verbose = verbose
--> 114         self._validate_steps()
    115 
    116     def get_params(self, deep=True):

D:\Anaconda\lib\site-packages\sklearn\pipeline.py in _validate_steps(self)
    157             if (not (hasattr(t, "fit") or hasattr(t, "fit_transform")) or not
    158                     hasattr(t, "transform")):
--> 159                 raise TypeError("All intermediate steps should be "
    160                                 "transformers and implement fit and transform "
    161                                 "or be the string 'passthrough' "

TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'LogisticRegression()' (type <class 'sklearn.linear_model._logistic.LogisticRegression'>) doesn't

【问题讨论】：

标签： python machine-learning scikit-learn transform pipeline

【解决方案1】：

由于错误建议，@ 987654322中的所有中间步骤都必须是变换器（用于功能转换），并且具有fit/transform方法，但是您已链接了两个模型。你应该只有一个，并且在管道的末端。

它看起来您可能希望执行网格搜索，沿着它们的相应管道和HyperParameter调整进行比较两个估算器。为此使用GridSearchCV，并使用定义的Pipeline 作为估计器：

from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.datasets import load_iris

pipeline = Pipeline([
    ('dim_reduction', PCA()),
    ('clf', LogisticRegression()),
])
parameters = [
    {
        'clf': (LogisticRegression(),),
        'clf__C': (0.001,0.01,0.1,1,10,100)
    }, {
        'clf': (RandomForestClassifier(),),
        'clf__n_estimators': (10, 30),
    }
]
grid_search = GridSearchCV(pipeline, parameters)

# some example dataset
X, y = load_iris(return_X_y=True)
X_train, X_tes, y_train, y_test = train_test_split(X, y)
grid_search.fit(X_train, y_train)

另请注意，您正在混合使用分类器和回归器。上面显示了如何通过组合两个示例分类器来做到这一点。尽管您可能需要一些时间来了解您所面临的问题类型以及适合的模型。

【讨论】：