【问题标题】:Number of syllables for words in a text文本中单词的音节数
【发布时间】:2011-05-04 06:22:14
【问题描述】:

我有以下代码摘录来使用 NLTK 查找给定输入文本“sample.txt”中所有单词的音节数:

   import re
   import nltk
   from curses.ascii import isdigit
   from nltk.corpus import cmudict
   import nltk.data
   import pprint

   d = cmudict.dict()

   tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
   fp = open("sample.txt")
   data = fp.read()
   tokens = nltk.wordpunct_tokenize(data)
   text = nltk.Text(tokens)
   words = [w.lower() for w in text]
   print words #to print all the words in input text
   regexp = "[A-Za-z]+"
   exp = re.compile(regexp)

   def nsyl(word):
      return max([len([y for y in x if isdigit(y[-1])]) for x in d[word]])

  sum1 = 0
  count = 0
  count1 = 0
  for a in words:
     if exp.match(a)):
         print a
         print "no of syllables:",nysl(a)
         sum1 = sum1 + nysl(a)
         print "sum of syllables:",sum1
         if nysl(a)<3:
             count = count + 1
         else:
             count1 = count1 + 1

  print "no of words with syll count less than 3:",count
  print "no of complex words:",count1

此代码将匹配每个输入单词与 cmu 字典并给出单词的音节数。但如果在字典中找不到该词或我在输入中使用专有名词,则它无法工作并显示错误。我想检查字典中是否存在该单词,如果不存在,请跳过它并继续考虑下一个单词。我该怎么做?

【问题讨论】:

    标签: python nltk


    【解决方案1】:

    我猜这个问题是一个关键错误。用

    替换你的定义
    def nsyl(word):
      lowercase = word.lowercase()
      if lowercase not in d:
         return -1
      else:
         return max([len([y for y in x if isdigit(y[-1])]) for x in d[lowercase]])
    

    相反,您可以在调用 nsyl 之前先检查该单词是否在字典中,然后您不必担心在 nsyl 方法本身内。

    【讨论】:

    猜你喜欢
    • 2012-02-24
    • 2021-11-26
    • 2011-07-02
    • 2011-07-27
    • 1970-01-01
    • 2010-09-29
    • 1970-01-01
    • 2017-10-13
    • 1970-01-01
    相关资源
    最近更新 更多