【问题标题】:Using Pipeline with custom classes in sklearn在 sklearn 中将 Pipeline 与自定义类一起使用
【发布时间】:2017-08-31 03:29:48
【问题描述】:

在管道流内部predict 中遇到问题,每个管道步骤都有自定义类。

class MyFeatureSelector():
    def __init__(self, features=5, method='pca'):
        self.features = features
        self.method = method

    def fit(self, X, Y):
        return self

    def transform(self, X, Y=None):
        try:
            if self.features < X.shape[1]:
                if self.method == 'pca':
                    selector = PCA(n_components=self.features)
                elif self.method == 'rfe':
                    selector = RFE(estimator=LinearRegression(n_jobs=-1),
                                   n_features_to_select=self.features,
                                   step=1)
                selector.fit(X, Y)
                return selector.transform(X)
        except Exception as err:
            print('MyFeatureSelector.transform(): {}'.format(err))
        return X

    def fit_transform(self, X, Y=None):
        self.fit(X, Y)
        return self.transform(X, Y)


model = Pipeline([
    ("DATA_CLEANER", MyDataCleaner(demo='', mode='strict')),
    ("DATA_ENCODING", MyEncoder(encoder_name='code')),
    ("FEATURE_SELECTION", MyFeatureSelector(features=15, method='rfe')),
    ("HUBER_MODELLING", HuberRegressor())
])

所以,上面的代码在这里工作得很好:

 model.fit(X, _Y)

但是我这里有个错误

 prediction = model.predict(XT)

错误:形状 (672,107) 和 (15,) 未对齐:107 (dim 1) != 15 (dim 0)

调试在此处显示该问题:selector.fit(X, Y),因为MyFeatureSelector 的新实例是在predict() 步骤期间创建的,而此时Y 不存在。

我哪里错了?

【问题讨论】:

  • 是的。因为管道将在训练期间调用 fit(),而在预测期间仅调用 transform()。你在 transform() 中调用 selector.fit(),这将改变数据
  • 首先检查X、XT和Y_的形状。显示错误的完整堆栈跟踪。然后我可能会建议对您的自定义类进行更改。

标签: python python-2.7 machine-learning scikit-learn pipeline


【解决方案1】:

工作版本发布如下:

class MyFeatureSelector():
    def __init__(self, features=5, method='pca'):
        self.features = features
        self.method = method
        self.selector = None
        self.init_selector()


    def init_selector():
        if self.method == 'pca':
            self.selector = PCA(n_components=self.features)
        elif self.method == 'rfe':
        self.selector = RFE(estimator=LinearRegression(n_jobs=-1),
                               n_features_to_select=self.features,
                               step=1)

    def fit(self, X, Y):
       return self

    def transform(self, X, Y=None):
        try:
            if self.features < X.shape[1]:
                if Y is not None:
                    self.selector.fit(X, Y)
                return selector.transform(X)
        except Exception as err:
            print('MyFeatureSelector.transform(): {}'.format(err))
       return X

def fit_transform(self, X, Y=None):
    self.fit(X, Y)
    return self.transform(X, Y)

【讨论】:

    猜你喜欢
    • 2021-11-04
    • 2017-09-15
    • 2011-09-30
    • 1970-01-01
    • 2018-10-24
    • 2020-01-31
    • 2019-09-15
    • 2014-04-08
    • 2011-09-22
    相关资源
    最近更新 更多