【问题标题】:Nltk module not finding correct English words pythonNltk模块找不到正确的英文单词python
【发布时间】:2019-09-08 19:38:35
【问题描述】:

我为 python 的“NLTK”库下载了“words”和“wordnet”:

import nltk
from nltk.corpus import words
from nltk.corpus import wordnet
nltk.download('words')
nltk.download('wordnet')

检查列表中的单词是否为英文。

但是,在运行脚本时,它无法将任何单词识别为英语。

这是我的脚本:

samplewords=['accident scene','a%32','j & quod','accident season','academic discount','academic diary','academic dictionary']

for word in samplewords:
    if word in words.words():
        print('English',word)
    else:
        print('Not English',word)

for word in samplewords:
    if not wordnet.synsets(word):
        print('Not english',word)
    else:
        print('English',word)

这是我对上述两个方面的收获:

Not english accident scene
Not english a%32
Not english j & quod
Not english accident season
Not english academic discount
Not english academic diary
Not english academic dictionary

我的预期结果:

    English accident scene
    Not english a%32
    Not english j & quod
    English accident season
    English academic discount
    English academic diary
    English academic dictionary

如何确保图书馆识别出这些是英文单词?

【问题讨论】:

    标签: python python-3.x nltk


    【解决方案1】:

    words() 包含语料库的单个单词,而不是单词搭配。

    您需要这样的东西来检查每个单词是否在words.words() 中(但是,这也会将不存在的搭配如“dictionary season”归类为英语单词):

    for word in samplewords:
        if all([w in words.words() for w in word.split()]):
            print('English',word)
        else:
            print('Not English',word)
    

    结果:

    English accident scene
    Not English a%32
    Not English j & quod
    English accident season
    English academic discount
    English academic diary
    English academic dictionary
    

    【讨论】:

      猜你喜欢
      • 2015-03-12
      • 2019-07-01
      • 2020-05-10
      • 2017-04-11
      • 2019-08-07
      • 1970-01-01
      • 2014-10-29
      • 2016-10-04
      • 1970-01-01
      相关资源
      最近更新 更多