在 scikit-learn 中重复 FeatureUnion答案

【问题标题】：repeated FeatureUnion in scikit-learn在 scikit-learn 中重复 FeatureUnion
【发布时间】：2019-01-12 21:09:50
【问题描述】：

我在 scikit-learn 中学习 Pipelines 和 FeatureUnions，因此想知道是否可以在一个类上重复应用“make_union”？

考虑以下代码：

import numpy as np
import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.linear_model import LogisticRegression
import sklearn.datasets as d

class IrisDataManupulation(BaseEstimator, TransformerMixin):
    """
       Raise the matrix of feature in power
    """
    def __init__(self, power=2):
        self.power = power

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return np.power(X, self.power)

iris_data = d.load_iris()

X, y = iris_data.data, iris_data.target


# feature union:
fu = FeatureUnion(transformer_list=[('squared', IrisDataManupulation(power=2)),
                               ('third', IrisDataManupulation(power=3))])

问题有什么巧妙的方法可以创建 FeatureUnion 而无需重复相同的转换器，而是传递参数列表？

例如：

fu_new = FeatureUnion(transformer_list=[('raise_power', IrisDataManupulation(), 
                      param_grid = {'raise_power__power':[2,3]})

【问题讨论】：

标签： python scikit-learn pipeline

【解决方案1】：

您可以在单个自定义 Transformer 中移动所有功能。我们可以更改您的IrisDataManupulation 来处理其中的权力列表：

class IrisDataManupulation(BaseEstimator, TransformerMixin):

    def __init__(self, powers=[2]):
        self.powers = powers

    def transform(self, X):
        powered_arrays = []
        for power in self.powers:
            powered_arrays.append(np.power(X, power))

        return np.hstack(powered_arrays)

那么你可以只使用这个新的转换器而不是 FeatureUnion：

fu = IrisDataManupulation(powers=[2,3])

注意：如果你想从你的原始特征生成多项式特征，我会推荐see PolynomialFeatures，它可以生成你想要的幂以及特征之间的其他交互。

【讨论】：

非常感谢您的回答！我正在练习使用 Pipeline 和 FeatureUnion 选项，所以我试图挑战自己。我喜欢你的想法，但是我想创建一个简单的函数 (IrisDataManupulation)，然后根据需要多次使用它 - 将 power 带出课堂。
@ArnoldKlein 对不起，我不明白。我当前的实现也没有硬编码power。只是它现在是一个值列表。您可以指定IrisDataManupulation(powers=[2])，它仍然可以工作。