【问题标题】:Python Converting output into sentencePython将输出转换为句子
【发布时间】:2014-11-17 21:06:42
【问题描述】:

我刚开始学习python。我试图通过分解单词并重新连接到一个句子来清理一个句子。文件 big.txt 有一些词,如青年、看守等。问题出在最后的过程中:looper,这会产生每一行的输出。

Correct 是在此代码之前定义的另一个过程,用于纠正每个单词

这里是代码:

zebra = 'Yout caretak taking care of something'

count = len(re.findall(r'\w+', zebra))

def looper(a,count):
words = nltk.word_tokenize(zebra)
for i in range(len(words)):
    X = correct(words[i])
    print (X)

final = looper(zebra)

它产生的输出:

youth
caretaker
walking
car
in
something

我应该如何把所有单独的输出和造句:

预期结果:

youth caretaker walking car in something

如果您需要更多详细信息,请告诉我。

提前致谢

【问题讨论】:

    标签: python python-2.7 nlp nltk


    【解决方案1】:

    使用列表理解:

    print " ".join([ correct(words[i]) for i in range(len(words)) ])
    

    应该是这样的:

    zebra = 'Yout caretak taking care of something'
    
    count = len(re.findall(r'\w+', zebra))
    words = nltk.word_tokenize(zebra)
    def looper(a,count):
        print " ".join([ correct(words[i]) for i in range(len(words)) ])
    

    words应该是函数外的,不需要每次循环都获取words。

    你也可以用这个:

    print " ".join([ correct(i) for i in words ])
    

    这是正确的做法:

    zebra = 'Yout caretak taking care of something'
    words = nltk.word_tokenize(zebra)
    print " ".join([ correct(i) for i in words ])
    

    这里不需要函数,因为 words 是单词列表,你可以迭代和加入。

    在您的代码中:

    zebra = 'Yout caretak taking care of something'
    words = nltk.word_tokenize(zebra)
    for x in words:
        print correct(x),
    

    演示:

    >>> zebra = 'Yout caretak taking care of something'
    >>> words = nltk.word_tokenize(zebra)
    >>> words
    ['Yout', 'caretak', 'taking', 'care', 'of', 'something']
    

    如您所见,nltk.word_tokenize 为您提供单词列表,因此您可以轻松地遍历它们,

    【讨论】:

    • 提供反对的理由
    • 期望的输出是 youth caretaker walking car in something 不仅是标记词
    【解决方案2】:
    >>> import nltk
    >>> zebra = 'Yout caretak taking care of something'
    >>> for word in nltk.word_tokenize(zebra):
    ...     print word
    ... 
    Yout
    caretak
    taking
    care
    of
    something
    

    然后$ sudo pip install pyenchant(见https://pythonhosted.org/pyenchant/api/enchant.html)和:

    >>> import nltk
    >>> import enchant
    >>> zebra = 'Yout caretak taking care of something'
    >>> dictionary = enchant.Dict('en_US')
    >>> for word in nltk.word_tokenize(zebra):
    ...     dictionary.suggest(word)
    ... 
    ['Out', 'Yost', 'Rout', 'Tout', 'Lout', 'Gout', 'Pout', 'Bout', 'Y out', 'Your', 'You', 'Youth', 'Yous', 'You t']
    ['caretaker', 'caret', 'Clareta', 'cabaret', 'curettage', 'critical']
    ['raking', 'takings', 'tasking', 'staking', 'tanking', 'talking', 'tacking', 'taring', 'toking', 'laking', 'caking', 'taming', 'making', 'taping', 'baking']
    ['CARE', 'acre', 'acer', 'race', 'Care', 'car', 'are', 'cares', 'scare', 'carer', 'caret', 'carte', 'cared', 'cadre', 'carve']
    ['if', 'pf', 'o', 'f', 'oaf', 'oft', 'off', 'sf', 'on', 'or', 'cf', 'om', 'op', 'oh', 'hf']
    ['somethings', 'some thing', 'some-thing', 'something', 'locksmithing', 'smoothness']
    

    那就试试吧:

    >>> for word in nltk.word_tokenize(zebra):
    ...     print [i for i in dictionary.suggest(word) if word in i]
    ... 
    ['Youth']
    ['caretaker']
    ['takings', 'staking']
    ['cares', 'scare', 'carer', 'caret', 'cared']
    ['oft', 'off']
    ['somethings', 'something']
    

    所以:

    >>> " ".join([[word if dictionary.check(word) else i for i in dictionary.suggest(word) if word in i][0] for word in nltk.word_tokenize(zebra)])
    'Youth caretaker taking care of something'
    

    【讨论】:

      【解决方案3】:
      zebra = 'Yout caretak taking care of something'
      
      count = len(re.findall(r'\w+', zebra))
      
      def looper(a,count):
      words = nltk.word_tokenize(zebra)
      for i in range(len(words)):
          X = correct(words[i])
          print X,    
      final = looper(zebra)
      

      只需在 X 之后添加 , --->print X,

      【讨论】:

      • 谢谢亚历克斯,但这个方法不起作用,它没有给我任何输出
      猜你喜欢
      • 2012-11-28
      • 2017-08-03
      • 2018-09-18
      • 1970-01-01
      • 1970-01-01
      • 2011-07-30
      • 1970-01-01
      • 2019-02-04
      • 2017-05-11
      相关资源
      最近更新 更多