【发布时间】:2017-03-09 17:58:45
【问题描述】:
from nltk.tokenize import RegexpTokenizer
from stop_words import get_stop_words
from gensim import corpora, models
import gensim
import os
from os import path
from time import sleep
import matplotlib.pyplot as plt
import random
from wordcloud import WordCloud, STOPWORDS
tokenizer = RegexpTokenizer(r'\w+')
en_stop = set(get_stop_words('en'))
with open(os.path.join('c:\users\kaila\jobdescription.txt')) as f:
Reader = f.read()
Reader = Reader.replace("will", " ")
Reader = Reader.replace("please", " ")
texts = unicode(Reader, errors='replace')
tdm = []
raw = texts.lower()
tokens = tokenizer.tokenize(raw)
stopped_tokens = [i for i in tokens if not i in en_stop]
tdm.append(stopped_tokens)
dictionary = corpora.Dictionary(tdm)
corpus = [dictionary.doc2bow(i) for i in tdm]
sleep(3)
ldamodel = gensim.models.ldamodel.LdaModel(corpus, num_topics=8, id2word = dictionary)
topics = ldamodel.print_topics(num_topics=8, num_words=200)
for i in topics:
print(i)
wordcloud = WordCloud().generate(i)
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
问题在于词云。我无法为 8 个主题中的每一个获得词云。我想要一个为 8 个主题提供 8 个词云的输出。 如果有人可以帮助我解决这个问题,那就太好了。
【问题讨论】:
标签: python topic-modeling word-cloud