【问题标题】:Vectorize two pandas columns at once with CountVectorizer使用 CountVectorizer 一次对两个 pandas 列进行矢量化
【发布时间】:2020-08-16 20:08:04
【问题描述】:

我想一次在两列上应用 Sklearn 的 CountVectorizer。 我试过这个:

features = df[['col 1', 'col2']]
results = df[['col 3']

vectorizer = CountVectorizer(lowercase=False)

features = vectorizer.fit_transform(features)
results = vectorizer.fit_transform(results)

但我收到此错误:

TypeError: expected string or bytes-like object

然后我尝试了这个:

from sklearn.compose import make_column_transformer

vectorizer = CountVectorizer(lowercase=False)
transformer = make_column_transformer((vectorizer, 'col 1'), (vectorizer, 'col 2'))

features = transformer.fit_transform(features)
results = vectorizer.fit_transform(results)

但我收到此错误:

ValueError: Specifying the columns using strings is only supported for pandas DataFrames

我做错了什么,我在这里看到了第二个解决方案:

https://media-exp1.licdn.com/dms/image/C4E22AQFC6Uf5_el2nQ/feedshare-shrink_800/0?e=1591228800&v=beta&t=7ZQbbIvgpQKlTfg1Z_IpGT9DB21LUqy_bkKaNE41l0E

【问题讨论】:

    标签: python pandas scikit-learn


    【解决方案1】:

    这是解决方案:

    features = df.iloc[:, [-2,-3]]
    results = df.iloc[:, -1]
    
    from sklearn.compose import make_column_transformer
    
    vectorizer = CountVectorizer(lowercase=False)
    transformer = make_column_transformer((vectorizer, 'col 1'), (vectorizer, 'col 2'))
    
    features = transformer.fit_transform(features)
    results = vectorizer.fit_transform(results)
    

    【讨论】:

      猜你喜欢
      • 2017-05-29
      • 2016-06-19
      • 2019-01-30
      • 2023-04-06
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-01-15
      相关资源
      最近更新 更多