【发布时间】:2019-06-30 01:02:13
【问题描述】:
我是 NLP 新手,如何找到 2 个句子之间的相似性以及如何打印每个单词的分数。以及如何实现gensim word2Vec模型。
试试这个代码: 这是我的两句话:
sentence1="I am going to India"
sentence2=" I am going to Bharat"
from gensim.models import word2vec
import numpy as np
words1 = sentence1.split(' ')
words2 = sentence2.split(' ')
#The meaning of the sentence can be interpreted as the average of its words
sentence1_meaning = word2vec(words1[0])
count = 1
for w in words1[1:]:
sentence1_meaning = np.add(sentence1_meaning, word2vec(w))
count += 1
sentence1_meaning /= count
sentence2_meaning = word2vec(words2[0])
count = 1
for w in words2[1:]:
sentence2_meaning = np.add(sentence2_meaning, word2vec(w))
count += 1
sentence2_meaning /= count
#Similarity is the cosine between the vectors
similarity = np.dot(sentence1_meaning, sentence2_meaning)/(np.linalg.norm(sentence1_meaning)*np.linalg.norm(sentence2_meaning))
【问题讨论】:
-
您的代码似乎缺少一个重要步骤:word2vec 模型应该从头开始训练或从某个文件加载。为什么不从 Gensim 的教程开始呢? radimrehurek.com/gensim/models/word2vec.html