Python 正则表达式是否支持 Perl 的 \G 之类的东西？答案

【问题标题】：Do Python regexes support something like Perl's \G?Python 正则表达式是否支持 Perl 的 \G 之类的东西？
【发布时间】：2010-10-06 12:02:21
【问题描述】：

我有一个包含 \G 元字符的 Perl 正则表达式（显示为 here，尽管希望不需要理解整个事情来回答这个问题）。我想把它翻译成 Python，但 Python 似乎不支持 \G。我能做什么？

【问题讨论】：

标签： python regex perl

【解决方案1】：

试试这些：

import re
re.sub()
re.findall()
re.finditer()

例如：

# Finds all words of length 3 or 4
s = "the quick brown fox jumped over the lazy dogs."
print re.findall(r'\b\w{3,4}\b', s)

# prints ['the','fox','over','the','lazy','dogs']

【讨论】：

这个 StackOverflow 问题解决了相反的情况：如何将 Python 的 re.findall() 翻译成 Perl？看看这个。 stackoverflow.com/a/467820/446767

【解决方案2】：

您可以使用re.match 来匹配锚定模式。 re.match 只会匹配文本的开头（位置 0）或您指定的位置。

def match_sequence(pattern,text,pos=0):
  pat = re.compile(pattern)
  match = pat.match(text,pos)
  while match:
    yield match
    if match.end() == pos:
      break # infinite loop otherwise
    pos = match.end()
    match = pat.match(text,pos)

这只会匹配给定位置的模式，以及后面 0 个字符之后的任何匹配项。

>>> for match in match_sequence(r'[^\W\d]+|\d+',"he11o world!"):
...   print match.group()
...
he
11
o

【讨论】：

【解决方案3】：

Python 的正则表达式没有 /g 修饰符，因此没有 \G 正则表达式标记。真可惜。

【讨论】：

【解决方案4】：

不要试图将所有内容都放在一个表达式中，因为它变得很难阅读、翻译（如您自己所见）和维护。

import re
lines = [re.sub(r'http://[^\s]+', r'<\g<0>>', line) for line in text_block.splitlines() if not line.startedwith('//')]
print '\n'.join(lines)

当您从 Perl 字面上翻译时，Python 通常不是最好的，它有自己的编程模式。

【讨论】：

【解决方案5】：

我知道我迟到了，但这里有一个替代 \G 方法的方法：

import re

def replace(match):
    if match.group(0)[0] == '/': return match.group(0)
    else: return '<' + match.group(0) + '>'

source = '''http://a.com http://b.com
//http://etc.'''

pattern = re.compile(r'(?m)^//.*$|http://\S+')
result = re.sub(pattern, replace, source)
print(result)

输出（通过Ideone）：

<http://a.com> <http://b.com>
//http://etc.

这个想法是使用匹配两种字符串的正则表达式：URL 或注释行。然后你使用回调（委托、闭包、嵌入代码等）来找出你匹配的那个并返回适当的替换字符串。

事实上，即使在支持\G 的风味中，这也是我的首选方法。即使在 Java 中，我也必须编写一堆样板代码来实现回调。

（我不是 Python 人，如果代码非常不符合 Python 风格，请见谅。）

【讨论】：