【发布时间】:2011-07-31 06:22:23
【问题描述】:
有谁知道是否有现有的模块或简单的方法可以在文本文件中读写带有词性的句子?我正在使用 python 和自然语言工具包 (NLTK)。例如这段代码:
import nltk
sentences = "Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world."
tagged = nltk.sent_tokenize(sentences.strip())
tagged = [nltk.word_tokenize(sent) for sent in tagged]
tagged = [nltk.pos_tag(sent) for sent in tagged]
print tagged
返回这个嵌套列表:
[[('Call', 'NNP'), ('me', 'PRP'), ('Ishmael', 'NNP'), ('.', '.')], [('Some', 'DT'), ('years', 'NNS'), ('ago', 'RB'), ('-', ':'), ('never', 'RB'), ('mind', 'VBP'), ('how', 'WRB'), ('long', 'JJ'), ('precisely', 'RB'), ('-', ':'), ('having', 'VBG'), ('little', 'RB'), ('or', 'CC'), ('no', 'DT'), ('money', 'NN'), ('in', 'IN'), ('my', 'PRP$'), ('purse', 'NN'), (',', ','), ('and', 'CC'), ('nothing', 'NN'), ('particular', 'JJ'), ('to', 'TO'), ('interest', 'NN'), ('me', 'PRP'), ('on', 'IN'), ('shore', 'NN'), (',', ','), ('I', 'PRP'), ('thought', 'VBD'), ('I', 'PRP'), ('would', 'MD'), ('sail', 'VB'), ('about', 'IN'), ('a', 'DT'), ('little', 'RB'), ('and', 'CC'), ('see', 'VB'), ('the', 'DT'), ('watery', 'NN'), ('part', 'NN'), ('of', 'IN'), ('the', 'DT'), ('world', 'NN'), ('.', '.')]]
我知道我可以轻松地将其转储到泡菜中,但我真的想将其导出为更大文本文件的一部分。我希望能够将列表导出到文本文件,然后稍后返回,解析它,并恢复原始列表结构。 NLTK 中是否有任何内置函数可以执行此操作?我看过了,但没找到……
示例输出:
<headline>Article headline</headline>
<body>Call me Ishmael...</body>
<pos_tags>[[('Call', 'NNP'), ('me', 'PRP'), ('Ishmael', 'NNP')...</pos_tags>
【问题讨论】:
标签: python nlp text-files nltk