【发布时间】:2021-03-17 00:55:15
【问题描述】:
我有一个用户 cmets 和评级数据集。我正在预处理此数据集,但出现如下错误。我该如何解决?
def DataCleaning(metin):
numbers = "0123456789"
lower_case=metin.lower()
punct_removed = [char for char in lower_case if char not in string.punctuation]
punct_removed=[char for char in punct_removed if char not in numbers]
punct_removed_join=''.join(punct_removed)
punct_removed_join_clean = [word for word in punct_removed_join.split() if word not in
stopwords.words('english')]
return punct_removed_join_clean
otel_verileri["reviews.text"] = otel_verileri["reviews.text"].apply(DataCleaning)
otel_verileri["reviews.text"].tolist()
OUTPUT:
AttributeError Traceback (most recent call last)
<ipython-input-56-a80b269d8bbe> in <module>()
1
----> 2 otel_verileri["reviews.text"] = otel_verileri["reviews.text"].apply(DataCleaning)
3 otel_verileri["reviews.text"].tolist()
1 frames
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-48-748ef67e84ac> in DataCleaning(metin)
1 def DataCleaning(metin):
2 numbers = "0123456789"
----> 3 lower_case=metin.lower()
4 punct_removed = [char for char in lower_case if char not in string.punctuation]
5 punct_removed=[char for char in punct_removed if char not in numbers]
AttributeError: 'float' object has no attribute 'lower'
【问题讨论】:
-
请阅读Under what circumstances may I add “urgent” or other similar phrases to my question, in order to obtain faster answers? - 总结是这不是解决志愿者的理想方式,并且可能会适得其反。请不要将此添加到您的问题中。
-
assert isinstance(metin, str), repr(metin)将其放在发生错误的行上方。运行。看看哪个值违反了您的期望。出于某种原因,您的reviews.text列不只包含文本。这里有一些自动转换吗?
标签: nlp tokenize data-cleaning sentiment-analysis preprocessor