WordNet 是你想要的。它很大,包含十万多个条目,并且可以免费使用。
但是,它不存储为 XML。要访问数据,您需要使用现有的WordNet APIs 之一作为您选择的语言。
使用 API 通常非常简单,因此我认为您不必担心“学习 (a) 复杂的 API”。例如,借用WordNet How to 为基于Python 的Natural Language Toolkit (NLTK):
>>> from nltk.corpus import wordnet
>>>
>>> # Get All Synsets for 'dog'
>>> # This is essentially all senses of the word in the db
>>> wordnet.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'),
Synset('cad.n.01'), Synset('frank.n.02'),Synset('pawl.n.01'),
Synset('andiron.n.01'), Synset('chase.v.01')]
>>> # Get the definition and usage for the first synset
>>> wn.synset('dog.n.01').definition
'a member of the genus Canis (probably descended from the common
wolf) that has been domesticated by man since prehistoric times;
occurs in many breeds'
>>> wn.synset('dog.n.01').examples
['the dog barked all night']
>>> # Get antonyms for 'good'
>>> wordnet.synset('good.a.01').lemmas[0].antonyms()
[Lemma('bad.a.01.bad')]
>>> # Get synonyms for the first noun sense of 'dog'
>>> wordnet.synset('dog.n.01').lemmas
[Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'),
Lemma('dog.n.01.Canis_familiaris')]
>>> # Get synonyms for all senses of 'dog'
>>> for synset in wordnet.synsets('dog'): print synset.lemmas
[Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'),
Lemma('dog.n.01.Canis_familiaris')]
...
[Lemma('frank.n.02.frank'), Lemma('frank.n.02.frankfurter'),
...
虽然 WordNet 中存在美式英语偏见,但它支持英式拼写和用法。例如,您可以查找“color”,“lift”的同义词之一是“elevator.n.01”。
关于 XML 的说明
如果必须将数据表示为 XML,您可以轻松地使用其中一种 API 来访问 WordNet 数据库
并将其转换为 XML,例如见Thinking XML: Querying WordNet as XML。