获取python中两个特定单词之间的所有单词答案

【问题标题】：Get all the words between two specific words in python获取python中两个特定单词之间的所有单词
【发布时间】：2015-07-22 01:39:57
【问题描述】：

我有一些这样的字符串：

The pizza is so hot
Today I bought an hot and tasty pizza

我需要提取python中pizza和形容词hot之间的所有单词。我该怎么办？？

这些是输出示例。

is so 
and tasty

请记住，属性（例如比萨）和形容词（例如热）可能是多标记词。

这就是我尝试过的：

  attribute = re.search(values[0], descrizione, re.IGNORECASE)
  value = re.search(names[0], descrizione, re.IGNORECASE)
    if (attribute):
        print (attribute.group())
        print (descrizione.find(attribute.group()))

    if (value):
        print (value.group())
        print (descrizione.find(value.group()))

【问题讨论】：

pizza和hot可以在一行中出现多次吗？它应该如何处理？
@BrendanAbel 没有。他们不能：D
您需要查找多个字符串中的单词，还是只查找两个字符串？
多个字符串不只是2...形容词可以在属性之前也可以在属性之后
听起来对 pyparsing 很有用

标签： python regex string substring

【解决方案1】：

另一种方法，您可以根据需要定义“从/到”模式。

>>> import regex
>>> rgx = regex.compile(r'(?si)(?|{0}(.*?){1}|{1}(.*?){0})'.format('pizza', 'hot'))
>>> s1 = 'The pizza is so hot'
>>> s2 = 'Today I bought an hot and tasty pizza'
>>> for s in [s1, s2]:
...     m = rgx.findall(s)
...     for x in m:
...         print x.strip()

is so
and tasty

【讨论】：

第 82 行，在中 rgx = re.compile(r'(?si)(?|{0}(.*?){1}|{1}(.*?) {0})'.format(attribute.group(), value.group())) 文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py”，行190，在编译返回_compile(pattern, flags)文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py”，第242行，在_compile引发错误，v#无效表达
您需要使用正则表达式模块。不是。
这不适用于 Python 3.9 - 我相信您确实应该使用 re 模块。

【解决方案2】：

我认为一个好的解决方案是使用 split 和 '|'正则表达式中的字符。

strs = []
strs.append('The pizza is so hot')
strs.append('Today I bought a hot and tasty pizza')
item = 'pizza'
adj = 'hot'
rets = []

for str_ in strs:
    ret = re.split(item + '|' + adj, str_, re.IGNORECASE)
    rets.append(ret[1].strip())

这是可行的，因为当我们单独考虑两个字符串时，我们会得到一个包含三个元素的列表。

ret = re.split(item + '|' + adj, strs[0], re.IGNORECASE)
print ret
['the ', ' is so ', '']

ret = re.split(item + '|' + adj, strs[1], re.IGNORECASE)
print ret
['Today I bought a ', ' and tasty ', '']

因为我们知道这两个词在字符串中只能出现一次，所以我们可以可靠地将 ret[1] 作为结果，因为字符串应该只被拆分两次：一次是在我们找到其中一个词时，另一次是在我们找到另一个。 OR 字符让我们在不提前知道单词顺序的情况下拆分列表。

【讨论】：

【解决方案3】：

x="""The pizza is so hot
Today I bought an hot and tasty pizza
wow pizza and another pizza"""
print [j for i,j in re.findall(r"(pizza|hot)\s*(.*?)\s*(?!\1)(?:hot|pizza)",x)]

用re.findall试试这个。

【讨论】：