【发布时间】:2017-06-09 01:13:16
【问题描述】:
我正在尝试根据词性对字符串进行词形还原,但在最后阶段,我遇到了错误。我的代码:
import nltk
from nltk.stem import *
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import wordnet
wordnet_lemmatizer = WordNetLemmatizer()
text = word_tokenize('People who help the blinging lights are the way of the future and are heading properly to their goals')
tagged = nltk.pos_tag(text)
def get_wordnet_pos(treebank_tag):
if treebank_tag.startswith('J'):
return wordnet.ADJ
elif treebank_tag.startswith('V'):
return wordnet.VERB
elif treebank_tag.startswith('N'):
return wordnet.NOUN
elif treebank_tag.startswith('R'):
return wordnet.ADV
else:
return ''
for word in tagged: print(wordnet_lemmatizer.lemmatize(word,pos='v'), end=" ")
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-40-afb22c78f770> in <module>()
----> 1 for word in tagged: print(wordnet_lemmatizer.lemmatize(word,pos='v'), end=" ")
E:\Miniconda3\envs\uol1\lib\site-packages\nltk\stem\wordnet.py in lemmatize(self, word, pos)
38
39 def lemmatize(self, word, pos=NOUN):
---> 40 lemmas = wordnet._morphy(word, pos)
41 return min(lemmas, key=len) if lemmas else word
42
E:\Miniconda3\envs\uol1\lib\site-packages\nltk\corpus\reader\wordnet.py in _morphy(self, form, pos)
1710
1711 # 1. Apply rules once to the input to get y1, y2, y3, etc.
-> 1712 forms = apply_rules([form])
1713
1714 # 2. Return all that are in the database (and check the original too)
E:\Miniconda3\envs\uol1\lib\site-packages\nltk\corpus\reader\wordnet.py in apply_rules(forms)
1690 def apply_rules(forms):
1691 return [form[:-len(old)] + new
-> 1692 for form in forms
1693 for old, new in substitutions
1694 if form.endswith(old)]
E:\Miniconda3\envs\uol1\lib\site-packages\nltk\corpus\reader\wordnet.py in <listcomp>(.0)
1692 for form in forms
1693 for old, new in substitutions
-> 1694 if form.endswith(old)]
1695
1696 def filter_forms(forms):
我希望能够一次根据每个单词的词性对字符串进行词形还原。请帮忙。
【问题讨论】:
-
我不太明白你的方法:你想在检查词的 POS 以确保你得到正确的引理之后对单词进行词形还原,是这样吗?如果是这样,你能给出预期的输入和输出吗?另外,
get_wordnet_pos()的意义何在 - 我没有看到它在任何地方使用?
标签: python nlp nltk lemmatization