【发布时间】:2015-09-04 18:43:29
【问题描述】:
如何使用 NLTK 模块来编写名词的单数和复数形式,或者告诉它在 txt 文件中搜索单词时不要区分单数和复数?我可以使用 NLTK 使程序不区分大小写吗?
【问题讨论】:
如何使用 NLTK 模块来编写名词的单数和复数形式,或者告诉它在 txt 文件中搜索单词时不要区分单数和复数?我可以使用 NLTK 使程序不区分大小写吗?
【问题讨论】:
你可以使用pattern.en来做到这一点,不太确定NLTK
>>> from pattern.en import pluralize, singularize
>>>
>>> print pluralize('child') #children
>>> print singularize('wolves') #wolf
见more
【讨论】:
pip install pattern
这是使用 NLTK 的一种可能方法。假设您正在搜索“功能”这个词:
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
wnl = WordNetLemmatizer()
text = "This is a small text, a very small text with no interesting features."
tokens = [token.lower() for token in word_tokenize(text)]
lemmatized_words = [wnl.lemmatize(token) for token in tokens]
'feature' in lemmatized_words
在所有单词中都使用str.lower() 处理了区分大小写的问题,当然,如果需要,您还必须对搜索词进行词形还原。
【讨论】:
raw_input('> ').lower()。
当前编写的模式不支持 Python 3(尽管在此处https://github.com/clips/pattern/issues/62 对此进行了持续讨论。
TextBlob https://textblob.readthedocs.io 建立在模式和 NLTK 之上,还包括复数功能。它似乎在这方面做得很好,尽管它并不完美。请参阅下面的示例代码。
from textblob import TextBlob
words = "cat dog child goose pants"
blob = TextBlob(words)
plurals = [word.pluralize() for word in blob.words]
print(plurals)
# >>> ['cats', 'dogs', 'children', 'geese', 'pantss']
【讨论】:
回答可能有点晚,但以防万一有人还在寻找类似的东西:
有支持 python 2.x 和 3.x 的 inflect(也可在 github 中找到)。 您可以找到给定单词的单数或复数形式:
import inflect
p = inflect.engine()
words = "cat dog child goose pants"
print([p.plural(word) for word in words.split(' ')])
# ['cats', 'dogs', 'children', 'geese', 'pant']
值得注意的是,复数的p.plural 会给你单数
形式。
此外,您可以提供一个 POS(词性)标签或提供一个数字,lib 确定它需要是复数还是单数:
p.plural('cat', 4) # cats
p.plural('cat', 1) # cat
# but also...
p.plural('cat', 0) # cats
【讨论】:
inflect.engine().plural('children') 输出 'childrens'... 为什么?
inflect.engine().plural('houses') 输出 'housess' 我不完全了解内部结构,我实际上是在通过这些天。有一些非常有效的案例,但也有一些看起来很明显的错误
我尝试了以下代码调整变形库,希望对您有所帮助。
import inflect
inflectEngine = inflect.engine()
def getSingular(word):
return word if not inflectEngine.singular_noun(word) else inflectEngine.singular_noun(word)
def getPlural(word):
word = getSingular(word)
return word if not inflectEngine.plural_noun(word) else inflectEngine.plural_noun(word)
########################################################
##################### Testing Area #####################
######################## Output ########################
# UNCONDITIONALLY FORM THE PLURAL
singularWord1 = "dog"
singularWord2 = "horse"
singularWord3 = "terretory"
pluralWord1 = "books"
pluralWord2 = "hotels"
pluralWord3 = "categories"
print("The plural/singular conversion : ", singularWord1, " => ", inflectEngine.plural(singularWord1))
print("The plural/singular conversion : ", singularWord2, " => ", inflectEngine.plural(singularWord2))
print("The plural/singular conversion : ", singularWord3, " => ", inflectEngine.plural(singularWord3))
print("The plural/singular conversion : ", pluralWord1, " => ", inflectEngine.plural(pluralWord1))
print("The plural/singular conversion : ", pluralWord2, " => ", inflectEngine.plural(pluralWord2))
print("The plural/singular conversion : ", pluralWord3, " => ", inflectEngine.plural(pluralWord3))
print("")
######################## Output ########################
# The plural/singular conversion : dog => dogs
# The plural/singular conversion : horse => horses
# The plural/singular conversion : terretory => terretories
# The plural/singular conversion : books => book
# The plural/singular conversion : hotels => hotel
# The plural/singular conversion : categories => category
#########################################################
print("The singular conversion")
print(getSingular(singularWord1))
print(getSingular(singularWord2))
print(getSingular(singularWord3))
print(getSingular(pluralWord1))
print(getSingular(pluralWord2))
print(getSingular(pluralWord3))
print("")
######################## Output ########################
# The singular conversion
# dog
# horse
# terretory
# book
# hotel
# category
#########################################################
print("The plural conversion")
print(getPlural(singularWord1))
print(getPlural(singularWord2))
print(getPlural(singularWord3))
print(getPlural(pluralWord1))
print(getPlural(pluralWord2))
print(getPlural(pluralWord3))
print("")
######################## Output ########################
# The plural conversion
# dogs
# horses
# terretories
# books
# hotels
# categories
#########################################################
# CONDITIONALLY FORM THE PLURAL
count = 1
print("I saw", count, inflectEngine.plural(singularWord1, count))
count = 5
print("I saw", count, inflectEngine.plural(singularWord1, count))
######################## Output ########################
# I saw 1 dog
# I saw 5 dogs
#########################################################
【讨论】: