出于某种原因,WordNet 在引理级别而不是 Synset 级别索引 antonymy 关系(请参阅 http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&s=good&i=8&h=00001000000000000000000000000000#c),所以问题是 Synsets 和 Lemmas 是多对多还是一对多-一个关系。
在模棱两可的情况下,一字多义,我们在String-to-Synset之间是一对多的关系,例如
>>> wn.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
在一个含义/概念、多个表示的情况下,我们在Synset-to-String 之间存在一对多的关系(其中 String 指的是引理名称):
>>> dog = wn.synset('dog.n.1')
>>> dog.definition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'
>>> dog.lemma_names()
[u'dog', u'domestic_dog', u'Canis_familiaris']
注意:到目前为止,我们比较的是String和Synsets而不是Lemmas和Synsets之间的关系。
“可爱”的是Lemma和String是一对一的关系:
>>> wn.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
>>> wn.synsets('dog')[0]
Synset('dog.n.01')
>>> wn.synsets('dog')[0].definition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'
>>> wn.synsets('dog')[0].lemmas()
[Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.Canis_familiaris')]
>>> wn.synsets('dog')[0].lemmas()[0]
Lemma('dog.n.01.dog')
>>> wn.synsets('dog')[0].lemmas()[0].name()
u'dog'
Lemma 对象的 _name 属性返回一个 unicode 字符串,而不是一个列表。从代码点:https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L202 和 https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L444
似乎引理与 Synset 是一对一的关系。来自https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L220的文档字符串:
引理属性,可通过同名方法访问::
所以我们可以这样做并以某种方式知道每个 Lemma 对象只会返回我们 1 个同义词集:
>>> wn.synsets('dog')[0].lemmas()[0]
Lemma('dog.n.01.dog')
>>> wn.synsets('dog')[0].lemmas()[0].synset()
Synset('dog.n.01')
假设您正在尝试进行一些情感分析,并且您需要 WordNet 中每个形容词的反义词,您可以轻松地执行此操作来接受反义词的同义词:
>>> from nltk.corpus import wordnet as wn
>>> all_adj_in_wn = wn.all_synsets(pos='a')
>>> def get_antonyms(ss):
... return set(chain(*[[a.synset() for a in l.antonyms()] for l in ss.lemmas()]))
...
>>> for ss in all_adj_in_wn:
... print ss, ':', get_antonyms(ss)
...
Synset('unable.a.01') : set([Synset('unable.a.01')])