【发布时间】:2013-07-13 19:54:39
【问题描述】:
下面代码中的lda.show_topics模块只打印每个主题前10个词的分布,我如何打印出语料库中所有词的完整分布?
from gensim import corpora, models
documents = ["Human machine interface for lab abc computer applications",
"A survey of user opinion of computer system response time",
"The EPS user interface management system",
"System and human system engineering testing of EPS",
"Relation of user perceived response time to error measurement",
"The generation of random binary unordered trees",
"The intersection graph of paths in trees",
"Graph minors IV Widths of trees and well quasi ordering",
"Graph minors A survey"]
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
for document in documents]
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
lda = models.ldamodel.LdaModel(corpus_tfidf, id2word=dictionary, num_topics=2)
for i in lda.show_topics():
print i
【问题讨论】:
-
您可以做些骇人听闻的事情,并更改站点包(或您计算机上的任何位置)中的 lda 包以打印所有这些,或将其代码复制到您的程序中,然后将其更改为打印全部而不是 10。
-
刚刚找到答案,它有点隐藏在 api =) 中。请参阅下面的答案
-
很好地找到了自己的答案。
标签: python lda topic-modeling gensim