【问题标题】:Word cloud stopwords are not processed不处理词云停用词
【发布时间】:2020-06-18 02:32:09
【问题描述】:

我创建了词云。
# Python程序生成WordCloud

# importing all necessery modules
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
import pandas as pd
from collections import Counter
from konlpy.tag import Okt

# Reads 'Youtube04-Eminem.csv' file
df = pd.read_excel(r"crawling.xlsx", encoding="UTF8")

comment_words = ''
# iterate through the csv file
for val in df.CONTENT:

    # typecaste each val to string
    val = str(val)

    okt=Okt()
    noun=okt.nouns(val)

    for i,v in enumerate(noun):
        if len(v)<2:
            noun.pop(i)

    comment_words += " ".join(noun) + " "
    count=Counter(noun)
    noun_list=count.most_common(100)

stopwords = set(STOPWORDS)
stopwords.add("모든언어")
stopwords.add("모든결과")
STOPWORDS.add("모든날짜")
STOPWORDS.add('지난지난')

wordcloud = WordCloud(width=800, height=800,
                      font_path='NanumBarunGothic.otf',
                      background_color='white',
                      stopwords=stopwords,
                      min_font_size=10).generate(comment_words)

# plot the WordCloud image
plt.figure(figsize=(8, 8), facecolor=None)
plt.imshow(wordcloud)
plt.axis("off")
plt.tight_layout(pad=0)
plt.savefig('word_cloud.png')
plt.show()

我将词云停用词添加到集合中,但它没有反映在结果中。 另外,我制作了一个停用词列表并将它们放入,但它没有用。 我该怎么办

【问题讨论】:

    标签: python word-cloud


    【解决方案1】:

    1-您可以在文件中添加停用词并在此处调用该文件是代码,它对我有用 首先创建方法只需复制并粘贴下面的代码

    def load_words_from_file(path_to_file):
    """Read text file return list of words."""
    sw_list = []
    with open(path_to_file, 'r') as f:
        [sw_list.append(word) for line in f for word in line.split()]
    return sw_list
    

    2- 调用此函数从您的停用词文件中检索数据,如下所示

    stop_words = load_words_from_file('stopwords.txt')
    
    The sample file"stopwords.txt" has all words which you can type on notepad one word in one line no more than one word in a single line.
    

    3-在wordcloud函数中使用变量“stop_words”

    wordcloud_statenisland = WordCloud(
    background_color='white',
    max_words=2000,
    stopwords =stop_words,
    
         )
    

    希望对你有帮助

    【讨论】:

      猜你喜欢
      • 2020-11-01
      • 2018-05-23
      • 2021-08-02
      • 2019-11-29
      • 2018-11-21
      • 2023-03-06
      • 1970-01-01
      • 2021-07-19
      • 2021-11-20
      相关资源
      最近更新 更多