【发布时间】:2014-03-11 14:50:07
【问题描述】:
我有一组词:
{下士,狗,猫,distingus,公司,电话,权威,车辆,座位, 轻量级、规则、常驻、专业}
我想计算前一组中每个单词之间的语义相似度。我有一个问题:
- 有些词不完整为“vhicule”。我怎么能忽略这些词?
示例代码:Python: Passing variables into Wordnet Synsets methods in NLTK
import nltk.corpus as corpus
import itertools as IT
import fileinput
if __name__=="__main__":
wordnet = corpus.wordnet
list1 = ["apple", "honey", "drinks", "flowers", "paper"]
list2 = ["pear", "shell", "movie", "fire", "tree"]
for word1, word2 in IT.product(list1, list2):
#print(word1, word2)
wordFromList1 = wordnet.synsets(word1)[0]
wordFromList2 = wordnet.synsets(word2)[0]
print('{w1}, {w2}: {s}'.format(
w1 = wordFromList1.name,
w2 = wordFromList2.name,
s = wordFromList1.wup_similarity(wordFromList2)))
假设我将“vhicule”添加到任何列表中。我收到以下错误:
IndexError:列表索引超出范围。
如何使用此错误来忽略数据库中不存在的单词?
【问题讨论】: