如果您只想使用 Lemmatizer,可以通过以下方式进行:
from spacy.lemmatizer import Lemmatizer
from spacy.lang.en import LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES
lemmatizer = Lemmatizer(LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES)
lemmas = lemmatizer(u'ducks', u'NOUN')
print(lemmas)
输出
['duck']
更新
自 spacy 2.2 版以来,LEMMA_INDEX、LEMMA_EXC 和 LEMMA_RULES 已捆绑到 Lookups 对象中:
import spacy
nlp = spacy.load('en')
nlp.vocab.lookups
>>> <spacy.lookups.Lookups object at 0x7f89a59ea810>
nlp.vocab.lookups.tables
>>> ['lemma_lookup', 'lemma_rules', 'lemma_index', 'lemma_exc']
您仍然可以直接将词形还原器与单词和 POS(词性)标签一起使用:
from spacy.lemmatizer import Lemmatizer, ADJ, NOUN, VERB
lemmatizer = nlp.vocab.morphology.lemmatizer
lemmatizer('ducks', NOUN)
>>> ['duck']
你可以像上面一样将 POS 标签作为导入的常量或字符串传递:
lemmatizer('ducks', 'NOUN')
>>> ['duck']
从 spacy.lemmatizer 导入 Lemmatizer、ADJ、名词、动词