【发布时间】:2019-02-06 09:31:37
【问题描述】:
我已经下载了en_core_web_lg模型并试图找到两个句子之间的相似性:
nlp = spacy.load('en_core_web_lg')
search_doc = nlp("This was very strange argument between american and british person")
main_doc = nlp("He was from Japan, but a true English gentleman in my eyes, and another one of the reasons as to why I liked going to school.")
print(main_doc.similarity(search_doc))
返回非常奇怪的值:
0.9066019751888448
这两个句子不应该有 90% 相似,它们的含义非常不同。
为什么会这样?为了使相似度结果更合理,是否需要添加某种额外的词汇?
【问题讨论】: