【发布时间】:2021-12-09 12:31:44
【问题描述】:
我有一个机器学习分类任务,它从各种固定长度向量表示的串联中进行训练。如何在 scikit-learn 中执行自动特征选择或网格搜索或任何其他已建立的技术来为我的数据找到最佳的转换器组合?
以这个文本分类流程为例:
model = Pipeline([
('vectorizer', FeatureUnion(transformer_list=[
('word-freq', TfidfVectorizer()), # vocab-size dimensional
('doc2vec', MyDoc2VecVectorizer()), # 32 dimensional (custom transformer)
('doc-length', MyDocLengthVectorizer()), # 1 dimensional (custom transformer)
('sentiment', MySentimentVectorizer()), # 3 dimensional (custom transformer)
... # possibly many other transformers
])),
('classifier', SVC())
])
我怀疑这可能属于scikit slep002 请求的dynamic-pipeline 功能。如果可以,中途怎么处理?
【问题讨论】:
标签: machine-learning scikit-learn feature-selection grid-search