【问题标题】:How to build parameter grid with FeatureUnion?如何使用 FeatureUnion 构建参数网格?
【发布时间】:2021-08-19 23:27:09
【问题描述】:

我正在尝试运行这个包含文本和数字特征的组合模型,但我收到了错误 ValueError: Invalid parameter tfidf for estimatorparameters 合成器有问题吗? 可能有用的链接: FeatureUnion usage FeatureUnion documentation

tknzr = tokenize.word_tokenize
vect = CountVectorizer(tokenizer=tknzr, stop_words={'english'}, max_df=0.9, min_df=2)
scl = StandardScaler(with_mean=False)
tfidf = TfidfTransformer(norm=None)
parameters = {
    'vect__ngram_range': [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)],
    'tfidf__use_idf': (True, False),
    'clf__alpha': tuple(10 ** (np.arange(-4, 4, dtype='float'))),
    'clf__loss': ('hinge', 'squared_hinge', 'log', 'modified_huber', 'perceptron'),
    'clf__penalty': ('l1', 'l2'),
    'clf__tol': (1e07, 1e-6, 1e-5, 1e-4, 1e-3)
}

combined_clf = Pipeline([
    ('features', FeatureUnion([
        ('numeric_features', Pipeline([
            ('selector', transfomer_numeric)
        ])),
        ('text_features', Pipeline([
            ('selector', transformer_text),
            ('vect', vect),
            ('tfidf', tfidf),
            ('scaler', scl),
        ]))
    ])),
    ('clf', SGDClassifier(random_state=42,
                          max_iter=int(10 ** 6 / len(X_train)), shuffle=True))
])

【问题讨论】:

    标签: python machine-learning scikit-learn nlp tf-idf


    【解决方案1】:

    here 所述,嵌套参数必须通过__(双下划线)语法访问。根据您要访问的参数的深度,这会递归地应用。参数use_idf在:

    features > text_features > tfidf > use_idf

    所以网格中的结果参数需要是:

    'features__text_features__tfidf__use_idf': [True, False]
    

    同样,ngram_range 的语法应该是:

    'features__text_features__vect__ngram_range': [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-08-20
      • 1970-01-01
      • 2018-09-30
      • 2014-07-20
      • 2016-08-07
      • 2021-02-07
      • 2020-08-31
      相关资源
      最近更新 更多