【发布时间】:2020-10-04 16:33:44
【问题描述】:
我的数据框看起来像 -
State text
Delhi 170 kw for330wp, shipping and billing in delhi...
Gujarat 4kw rooftop setup for home Photovoltaic Solar...
Karnataka language barrier no requirements 1kw rooftop ...
Madhya Pradesh Business PartnerDisqualified Mailed questionna...
Maharashtra Rupdaypur, panskura(r.s) Purba Medinipur 150kw...
我想从此数据框中删除标点符号和停用词。我已经完成了以下代码。但它不起作用 -
import nltk
nltk.download('stopwords')
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import string
import collections
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import matplotlib.cm as cm
import matplotlib.pyplot as plt
% matplotlib inline
import nltk
from nltk.corpus import stopwords
import string
from sklearn.feature_extraction.text import CountVectorizer
import re
def message_cleaning(message):
Test_punc_removed = [char for char in message if char not in string.punctuation]
Test_punc_removed_join = ''.join(Test_punc_removed)
Test_punc_removed_join_clean = [word for word in Test_punc_removed_join.split() if word.lower() not in stopwords.words('english')]
return Test_punc_removed_join_clean
df['text'] = df['text'].apply(message_cleaning)
AttributeError: 'set' object has no attribute 'words'
【问题讨论】:
标签: python-3.x pandas scikit-learn nltk