【问题标题】:How to use NLTK WordNet to check for incomplete words in Python?如何使用 NLTK WordNet 在 Python 中检查不完整的单词?
【发布时间】:2014-03-11 14:50:07
【问题描述】:

我有一组词:

{下士,狗,猫,distingus,公司,电话,权威,车辆,座位, 轻量级、规则、常驻、专业}

我想计算前一组中每个单词之间的语义相似度。我有一个问题:

  1. 有些词不完整为“vhicule”。我怎么能忽略这些词?

示例代码:Python: Passing variables into Wordnet Synsets methods in NLTK

import nltk.corpus as corpus
import itertools as IT
import fileinput

if __name__=="__main__":
    wordnet = corpus.wordnet
    list1 = ["apple", "honey", "drinks", "flowers", "paper"]
    list2 = ["pear", "shell", "movie", "fire", "tree"]

    for word1, word2 in IT.product(list1, list2):
        #print(word1, word2)
        wordFromList1 = wordnet.synsets(word1)[0]
        wordFromList2 = wordnet.synsets(word2)[0]
        print('{w1}, {w2}: {s}'.format(
            w1 = wordFromList1.name,
            w2 = wordFromList2.name,
            s = wordFromList1.wup_similarity(wordFromList2)))

假设我将“vhicule”添加到任何列表中。我收到以下错误:

IndexError:列表索引超出范围

如何使用此错误来忽略数据库中不存在的单词?

【问题讨论】:

    标签: python nltk wordnet


    【解决方案1】:

    您可以检查nltk.corpus.wordnet.synsets(i) 是否返回同义词列表:

    >>> from nltk.corpus import wordnet as wn
    >>> x = [i.strip() for i in """corporal, dog, cat, distingus, Company, phone, authority, vhicule, seats, lightweight, rules, resident, expertise""".lower().split(",")]
    >>> x
    ['corporal', 'dog', 'cat', 'distingus', 'company', 'phone', 'authority', 'vhicule', 'seats', 'lightweight', 'rules', 'resident', 'expertise']
    >>> y = [i for i in x if len(wn.synsets(i)) > 0]
    >>> y
    ['corporal', 'dog', 'cat', 'company', 'phone', 'authority', 'seats', 'lightweight', 'rules', 'resident', 'expertise']
    

    还有一种更简洁的方法是检查wn.synsets(i)是否为None

    >>> from nltk.corpus import wordnet as wn
    >>> x = [i.strip() for i in """corporal, dog, cat, distingus, Company, phone, authority, vhicule, seats, lightweight, rules, resident, expertise""".lower().split(",")]
    >>> x
    ['corporal', 'dog', 'cat', 'distingus', 'company', 'phone', 'authority', 'vhicule', 'seats', 'lightweight', 'rules', 'resident', 'expertise']
    >>> [i for i in x if wn.synsets(i)]
    ['corporal', 'dog', 'cat', 'company', 'phone', 'authority', 'seats', 'lightweight', 'rules', 'resident', 'expertise']
    

    【讨论】:

    • 是的,你可以通过len(wn.synsets(i))查看一个词有多少同义词。
    猜你喜欢
    • 1970-01-01
    • 2023-03-11
    • 2016-03-01
    • 1970-01-01
    • 2014-08-31
    • 2013-07-14
    • 2013-02-26
    • 2013-10-16
    • 1970-01-01
    相关资源
    最近更新 更多