【发布时间】:2017-09-07 10:34:23
【问题描述】:
我是 Gensim 中“Word2Vec”的新手。我想为文本构建一个 Word2Vec 模型(摘自维基百科:机器学习)并找到与“机器学习”最相似的词。
我目前的代码如下。
# import modules & set up logging
from gensim.models import Word2Vec
sentences = "Machine learning is the subfield of computer science that, according to Arthur Samuel, gives computers the ability to learn without being explicitly programmed.[1][2][verify] Samuel, an American pioneer in the field of computer gaming and artificial intelligence, coined the term machine learning in 1959 while at IBM. Evolved from the study of pattern recognition and computational learning theory in artificial intelligence,[3] machine learning explores the study and construction of algorithms that can learn from and make predictions on data[4] – such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions,[5]:2 through building a model from sample inputs. Machine learning is employed in a range of computing tasks where designing and programming explicit algorithms with good performance is difficult or infeasible; example applications include email filtering, detection of network intruders or malicious insiders working towards a data breach,[6] optical character recognition (OCR),[7] learning to rank, and computer vision."
# train word2vec on the sentences
model = Word2Vec(sentences, min_count=1)
vocab = list(model.wv.vocab.keys())
print(vocab[:10])
但是,对于词汇,我得到一个字符输出。
['M', 'a', 'c', 'h', 'i', 'n', 'e', ' ', 'l', 'r']
请帮助我使用 model.most_similar
获得 most_similar_words【问题讨论】: