使用 NLTK 使用 MaltParser 解析多个句子答案

【问题标题】：Parsing multiple sentences with MaltParser using NLTK使用 NLTK 使用 MaltParser 解析多个句子
【发布时间】：2015-05-26 13:58:33
【问题描述】：

有很多 MaltParser 和/或 NLTK 相关的问题：

现在，NLTK 中有一个更稳定的 MaltParser API 版本：https://github.com/nltk/nltk/pull/944，但在同时解析多个句子时会出现问题。

一次解析一个句子似乎很好：

_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'     
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent = 'I shot an elephant in my pajamas'.split()
>>> sent2 = 'Time flies like banana'.split()
>>> print(mp.parse_one(sent).tree())
(pajamas (shot I) an elephant in my)

但解析句子列表不会返回 DependencyGraph 对象：

_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'     
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent = 'I shot an elephant in my pajamas'.split()
>>> sent2 = 'Time flies like banana'.split()
>>> print(mp.parse_one(sent).tree())
(pajamas (shot I) an elephant in my)
>>> print(next(mp.parse_sents([sent,sent2])))
<listiterator object at 0x7f0a2e4d3d90> 
>>> print(next(next(mp.parse_sents([sent,sent2]))))
[{u'address': 0,
  u'ctag': u'TOP',
  u'deps': [2],
  u'feats': None,
  u'lemma': None,
  u'rel': u'TOP',
  u'tag': u'TOP',
  u'word': None},
 {u'address': 1,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 2,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'I'},
 {u'address': 2,
  u'ctag': u'NN',
  u'deps': [1, 11],
  u'feats': u'_',
  u'head': 0,
  u'lemma': u'_',
  u'rel': u'null',
  u'tag': u'NN',
  u'word': u'shot'},
 {u'address': 3,
  u'ctag': u'AT',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'AT',
  u'word': u'an'},
 {u'address': 4,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'elephant'},
 {u'address': 5,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'in'},
 {u'address': 6,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'my'},
 {u'address': 7,
  u'ctag': u'NNS',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NNS',
  u'word': u'pajamas'},
 {u'address': 8,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'Time'},
 {u'address': 9,
  u'ctag': u'NNS',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NNS',
  u'word': u'flies'},
 {u'address': 10,
  u'ctag': u'NN',
  u'deps': [],
  u'feats': u'_',
  u'head': 11,
  u'lemma': u'_',
  u'rel': u'nn',
  u'tag': u'NN',
  u'word': u'like'},
 {u'address': 11,
  u'ctag': u'NN',
  u'deps': [3, 4, 5, 6, 7, 8, 9, 10],
  u'feats': u'_',
  u'head': 2,
  u'lemma': u'_',
  u'rel': u'dep',
  u'tag': u'NN',
  u'word': u'banana'}]

为什么使用 parse_sents() 不返回 parse_one 的迭代？

但是我可以，只是变得懒惰并做：

_path_to_maltparser = '/home/alvas/maltparser-1.8/dist/maltparser-1.8/'
_path_to_model= '/home/alvas/engmalt.linear-1.7.mco'     
>>> mp = MaltParser(path_to_maltparser=_path_to_maltparser, model=_path_to_model)
>>> sent1 = 'I shot an elephant in my pajamas'.split()
>>> sent2 = 'Time flies like banana'.split()
>>> sentences = [sent1, sent2]
>>> for sent in sentences:
>>> ...    print(mp.parse_one(sent).tree())

但这不是我正在寻找的解决方案。 我的问题是如何回答为什么parse_sent() 不返回parse_one() 的可迭代对象。以及如何在 NLTK 代码中修复它？

@NikitaAstrakhantsev 回答后，我已经尝试过它现在输出一个解析树，但它似乎很困惑，并且在解析之前将两个句子合二为一。

# Initialize a MaltParser object with a pre-trained model.
mp = MaltParser(path_to_maltparser=path_to_maltparser, model=path_to_model) 
sent = 'I shot an elephant in my pajamas'.split()
sent2 = 'Time flies like banana'.split()
# Parse a single sentence.
print(mp.parse_one(sent).tree())
print(next(next(mp.parse_sents([sent,sent2]))).tree())

[出]：

(pajamas (shot I) an elephant in my)
(shot I (banana an elephant in my pajamas Time flies like))

从代码来看，它似乎在做一些奇怪的事情：https://github.com/nltk/nltk/blob/develop/nltk/parse/api.py#L45

为什么 NLTK 中的解析器抽象类在解析之前将两个句子合二为一？我是否错误地调用了parse_sents()？如果是这样，调用parse_sents() 的正确方法是什么？

【问题讨论】：

标签： java python parsing nlp nltk

【解决方案1】：

正如我在您的代码示例中看到的，您不会在此行中调用 tree()

>>> print(next(next(mp.parse_sents([sent,sent2]))))

虽然您在所有情况下都使用parse_one() 拨打tree()。

否则我看不出它可能发生的原因：ParserI 的 parse_one() 方法不会在 MaltParser 中被覆盖，它所做的一切只是调用 parse_sents() 的 MaltParser，请参阅 @987654321 @。

更新： The line you're talking about 未被调用，因为parse_sents() 在MaltParser 中被覆盖并被直接调用。

我现在唯一的猜测是 java lib maltparser 在包含多个句子的输入文件中无法正常工作（我的意思是 this block - 运行 java 的地方）。也许原来的麦芽解析器已经改变了格式，现在不是'\n\n'。不幸的是，我无法自己运行此代码，因为maltparser.org 已连续第二天关闭。我检查了输入文件是否具有预期的格式（句子由双结束线分隔），因此 python 包装器不太可能合并句子。

【讨论】：

谢谢！！现在它输出树，但它是错误的树，请参阅更新的问题。
发现错误！！！我太瞎了github.com/alvations/nltk/blob/patch-1/nltk/parse/malt.py#L56 yield '\n\n' 缩进错误！！！天哪……