【发布时间】:2019-11-03 02:21:53
【问题描述】:
我不明白这个错误...在将其转换为列表之前,我已经将 df 转换为小写
数据框:
all_cols
0 who is your hero and why
1 what do you do to relax
2 this is a hero
4 how many hours of sleep do you get a night
5 describe the last time you were relax
代码:
from sklearn.cluster import MeanShift
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
from sklearn.feature_extraction.text import TfidfVectorizer
df['all_cols'] = df['all_cols'].str.lower()
df_list = df.values.tolist()
pipeline = Pipeline(steps=[
('tfidf', TfidfVectorizer()),
('trans', FunctionTransformer(lambda x: x.todense(), accept_sparse=True)),
('clust', MeanShift())])
pipeline.fit(df_list)
pipeline.named_steps['clust'].labels_
result = [(label,doc) for doc,label in zip(df_list, pipeline.named_steps['clust'].labels_)]
for label,doc in sorted(result):
print(label, doc)
但我在这一行有一个错误:
AttributeError Traceback(最近一次调用最后一次) 在
----> 1 个 pipeline.fit(df_list)
2 pipeline.named_steps['clust'].labels_AttributeError: 'list' 对象没有属性 'lower'
但是如果我之前已经传递了小写数据框,为什么程序会返回小写错误?
【问题讨论】:
-
你确定所有行都是字符串而不是列表吗?
标签: python pandas cluster-analysis lowercase