使用 pyparsing 进行非贪婪列表解析答案

【问题标题】：Non-greedy list parsing with pyparsing使用 pyparsing 进行非贪婪列表解析
【发布时间】：2015-06-18 14:21:27
【问题描述】：

我有一个字符串，其中包含我试图用 pyparsing 解析的单词列表。

列表始终至少包含三个项目。由此我希望 pyparsing 生成三个组，其中第一个包含直到最后两项的所有单词，最后两组应该是最后两项。例如：

"one two three four"

应该被解析成类似的东西：

["one two"], "three", "four"

我可以用正则表达式做到这一点：

import pyparsing as pp
data = "one two three four"
grammar = pp.Regex(r"(?P<first>(\w+\W?)+)\s(?P<penultimate>\w+) (?P<ultimate>\w+)")
print(grammar.parseString(data).dump())

给出：

['one two three four']
- first: one two
- penultimate: three
- ultimate: four

我的问题是，由于 pyparsing 贪婪的性质，我无法使用非正则表达式 ParserElement 获得相同的结果，例如以下内容：

import pyparsing as pp
data = "one two three four"
word = pp.Word(pp.alphas)
grammar = pp.Group(pp.OneOrMore(word))("first") + word("penultimate") + word("ultimate")
grammar.parseString(data)

回溯失败：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/pyparsing.py", line 1125, in parseString
    raise exc
pyparsing.ParseException: Expected W:(abcd...) (at char 18), (line:1, col:19)

因为 OneOrMore 会吞掉列表中的所有单词。到目前为止，我试图通过 FollowedBy 或 NotAny 来防止这种贪婪行为的尝试都失败了 - 关于如何获得所需行为的任何建议？

【问题讨论】：

标签： python pyparsing non-greedy

【解决方案1】：

嗯，您的 OneOrMore 表达式只需要稍微收紧一下 - 您在 FollowedBy 的帮助下走在了正确的轨道上。你真的不想要一个或更多（单词），你想要“一个或更多（后面至少有 2 个单词的单词）”。要将这种前瞻添加到 pyparsing，您甚至可以使用新的 '*' 乘法运算符来指定前瞻计数：

grammar = pp.Group(pp.OneOrMore(word + pp.FollowedBy(word*2)))("first") + word("penultimate") + word("ultimate")

现在将其转储出来即可：

[['one', 'two'], 'three', 'four']
- first: ['one', 'two']
- penultimate: three
- ultimate: four

【讨论】：

非常感谢您的回答，我失败了，因为我在 OneOrMore 之外添加了 FollowedBy，例如：pp.OneOrMore(word) pp.FollowedBy(word + word)