【问题标题】:name 'DataFrameSelector' is not defined名称“DataFrameSelector”未定义
【发布时间】:2018-07-07 13:49:54
【问题描述】:

我目前正在阅读“使用 Scikit-Learn 和 TensorFlow 进行机器学习实践”。尝试重新创建转换管道代码时出现错误。我该如何解决这个问题?

代码:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

num_pipeline = Pipeline([('imputer', Imputer(strategy = "median")),
                        ('attribs_adder', CombinedAttributesAdder()),
                        ('std_scaler', StandardScaler()),
                        ])

housing_num_tr = num_pipeline.fit_transform(housing_num)

from sklearn.pipeline import FeatureUnion

num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]

num_pipeline = Pipeline([
                         ('selector', DataFrameSelector(num_attribs)),
                         ('imputer', Imputer(strategy = "median")),
                         ('attribs_adder', CombinedAttributesAdder()),
                         ('std_scaler', StandardScaler()),
                        ])

cat_pipeline = Pipeline([('selector', DataFrameSelector(cat_attribs)), 
                         ('label_binarizer', LabelBinarizer()),
                        ])

full_pipeline = FeatureUnion(transformer_list = [("num_pipeline", num_pipeline), 
                                                 ("cat_pipeline", cat_pipeline),
                                                ])

# And we can now run the whole pipeline simply:

housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared

错误:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-350-3a4a39e5bc1c> in <module>()
     43 
     44 num_pipeline = Pipeline([
---> 45                          ('selector', DataFrameSelector(num_attribs)),
     46                          ('imputer', Imputer(strategy = "median")),
     47                          ('attribs_adder', CombinedAttributesAdder()),

NameError: name 'DataFrameSelector' is not defined

【问题讨论】:

    标签: python scikit-learn pipeline


    【解决方案1】:

    您应该在当前代码单元格之前插入一个单元格,然后键入以下代码

    from sklearn.base import BaseEstimator, TransformerMixin

    class DataFrameSelector(BaseEstimator, TransformerMixin):

    def __init__(self, attribute_names):
        self.attribute_names = attribute_names
    def fit(self, X, y=None):
        return self
    def transform(self, X, y=None):
        return X[self.attribute_names].values   
    

    通过这种方式,您的 DataFrameSelector 类将被预先定义

    【讨论】:

      【解决方案2】:

      看起来您正在从事 California Housing Price Predictions 书中 Hands-On Machine Learning with Scikit-learn and TensorFlow 中的一个项目。

      错误

      NameError: 名称“DataFrameSelector”未定义

      出现是因为 sklearn 中没有 DataFrameSelector 转换器。要克服此错误,您需要为此编写自己的自定义转换器。

      在本书中,您可以在下一页找到DataFrameSelector 转换器代码,但我也会在下面复制此代码。

      from sklearn.base import BaseEstimator, TransformerMixin
      
      class DataFrameSelector(BaseEstimator, TransformerMixin):
          def __init__(self, attribute_names):
              self.attribute_names = attribute_names
          def fit(self, X, y=None):
              return self
          def transform(self, X):
              return X[self.attribute_names].values
      

      BaseEstimatorTransformerMixin 类用于继承 fit()transform()fit_transform() 方法。

      现在,sklearn-pandas 中也提供了另一个类DataFrameMapper,具有类似的目标。 您可以通过以下链接找到有关该课程的详细信息:
      DataFrameMapper

      【讨论】:

        【解决方案3】:

        如果您正在使用 Sklearn 和 Tensorflow 关注机器学习之手, 就在下一页,一个定制的 Dataframe 生成器

        from sklearn.pipeline import FeatureUnion
        class DataFrameSelector(BaseEstimator, TransformerMixin):
            def __init__(self, attribute_names):
                self.attribute_names = attribute_names
            def fit(self, X, y=None):
                return self
            def transform(self, X):
                return X[self.attribute_names].values
        

        【讨论】:

          【解决方案4】:
          from sklearn.pipeline import FeatureUnion
          class DataFrameSelector(BaseEstimator, TransformerMixin):
              def __init__(self, attribute_names):
                  self.attribute_names = attribute_names
              def fit(self, X, y=None):
                  return self
              def transform(self, X):
                  return X[self.attribute_names].values
          

          它可能会起作用。

          【讨论】:

          • 您必须在此处添加适当的 BaseEstimator 和 TransformerMixin 导入。
          【解决方案5】:
          from sklearn.base import BaseEstimator, TransformerMixin
          
          class DataFrameSelector(BaseEstimator, TransformerMixin):
              def __init__(self, attribute_names):
                  self.attribute_names=attribute_names
              def fit(self, X, y=None):
                  return self
              def transform(self, X):
                  return X[self.attribute_names].values
          

          这应该可行。

          【讨论】:

          • 有时可以参考来源:Hands-On Machine Learning with Scikit-Learn & TensorFlow page 97
          【解决方案6】:

          DataFrameSelector 未找到,需要导入。它不是sklearn 的一部分,但在sklearn-features 中有同名的东西:

          from sklearn_features.transformers import DataFrameSelector
          

          (DOCS)

          【讨论】:

          • 您好,谢谢您的回复。当我进一步阅读时,作者定义了一个 DataFrameSelector 类。当我把它放在我上面写的代码之上时,它似乎可以工作。但是,我收到一个新错误:stackoverflow.com/questions/46162855/…
          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2019-09-29
          • 2018-01-24
          • 2018-02-21
          • 2020-07-31
          • 2021-11-26
          相关资源
          最近更新 更多