删除 NLTK 停用词答案

【问题标题】：removing NLTK StopWords删除 NLTK 停用词
【发布时间】：2026-02-10 06:30:02
【问题描述】：

我正在尝试删除我的数据集的停用词。

stopwordsw = nltk.corpus.stopwords.words('german')

def remove_stopwords(txt_clean):
      txt_clean =  [Word for Word in txt_clean if Word not in stopwords]
      return txt_clean

data['Tweet_sw'] = data['Tweet_clean'].apply(lambda x: remove_stopwords(x))
data.head()

我有两个问题。

首先，输出是一个字符一个字符的（用逗号分隔），尽管我对包含单词的停用词列表进行了检查。

我可以用join命令解决这个问题，但我不明白为什么它被分割成字符。

第二个也是真正的问题是停用词的删除不起作用。列表中明确出现的单词不会从句子中删除。

我的错误在哪里？

image

【问题讨论】：

你可以试试txt_clean = [Word for Word in txt_clean.split() if Word not in stopwords]
欢迎 :) 你得到字符输出的原因是因为......你正在迭代一个字符串（一个字符一个字符）并将它放入一个列表中。

标签： nlp nltk data-cleaning stop-words

【解决方案1】：

txt_clean = [如果 Word 不在停用词中，则在 txt_clean.split() 中的 Word for Word]

【讨论】：