【发布时间】:2018-12-07 10:49:42
【问题描述】:
我的数据框中有字符串列,我已将句子拆分为单词。现在我需要计算该单词的出现并将它们转换为列。基本上是创建一个文档术语矩阵
0 [kubernetes, client, bootstrapping, ponda]
1 [micro, insu]
2 [motor, upi]
3 [secure, app, installation]
4 [health, insu, express, credit, customer]
5 [secure, app, installation]
6 [aap, insta]
7 [loan, house, loan, customers]
输出:
kubernetes client bootstrapping ponda loan customers installation
0 1 1 1 1 0 0 0
1 0 0 0 0 1 0 1
2 0 2 0 0 0 0 0
3 1 1 1 1 0 0 0
到目前为止的代码
from sklearn.feature_extraction.text import CountVectorizer
countvec = CountVectorizer()
countvec.fit_transform(df.new)
错误:
AttributeError: 'list' 对象没有属性 'lower'
【问题讨论】:
标签: python-3.x pandas dataframe word-count