如何将正则表达式与模式和任意次数匹配？答案

【问题标题】：How can I match a regex with a pattern and arbitary amount of times?如何将正则表达式与模式和任意次数匹配？
【发布时间】：2013-07-09 16:16:01
【问题描述】：

我目前有一个正则表达式定义如下：

>>> import re
>>> regex = re.compile("(\d+:)+(\d+)")
>>> search_results = regex.search("52345:54325432:555:443:3:33")
>>> search_results.groups()
('3:', '33')

我知道我能做到

>>> "52345:54325432:555:443:3:33".split(":")

用于将每个项目拆分为标记，但我想知道如何使用正则表达式来实现这一点。

【问题讨论】：

标签： python regex string parsing tokenize

【解决方案1】：

如果您想要所有匹配项，请使用 re.findall，re.search 在第一个匹配项处停止。：

>>> strs = "52345:54325432:555:443:3:33"
>>> re.findall(r"(\d+):(\d+)",strs)
[('52345', '54325432'), ('555', '443'), ('3', '33')]

如果您想要与str.split 完全相同的结果，那么您可以这样做：

>>> re.split(r":",strs)
['52345', '54325432', '555', '443', '3', '33']
>>> re.findall(r"[^:]+",strs)
['52345', '54325432', '555', '443', '3', '33']

【讨论】：

【解决方案2】：

看看这是否有帮助...

>>> pat = r'(\d+(?=\:)|\d+$)'
>>> regexp = re.compile(pat)
>>> m = regexp.findall("52345:54325432:555:443:3:33")
>>> m
['52345', '54325432', '555', '443', '3', '33']
>>>

【讨论】：

【解决方案3】：

你应该使用split来解决这个问题。

findall 可以处理任何有效的字符串。不幸的是，它也适用于任何无效的字符串。如果那是您想要的，那很好；但可能你想知道是否有错误。

例子：

>>> import re
>>> digits = re.compile("\d+")
>>> digits.findall("52345:54325432:555:443:3:33")
['52345', '54325432', '555', '443', '3', '33']
>>> digits.findall("52345:54325.432:555:443:3:33")
['52345', '54325', '432', '555', '443', '3', '33']
>>> digits.findall(""There are 2 numbers and 53 characters in this string."")
['2', '53']

当然，如果你确定只使用re模块，你可以先匹配再拆分：

>>> valid = re.compile("(?:\d+:)*\d+$")
>>> digits = re.compile("\d+")
>>> s = "52345:54325432:555:443:3:33"
>>> digits.findall(s) if valid.match(s) else []

相比之下：

>>> [int(n) for n in "52345:54325432:555:443:3:33".split(":")]
[52345, 54325432, 555, 443, 3, 33]
>>> [int(n) for n in "52345:54325.432:555:443:3:33".split(":")]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '54325.432'

>>> [int(n)
...  for n in "There are 2 numbers and 53 characters in this string.".split(":")]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10:
  'There are 2 numbers and 53 characters in this string.'

【讨论】：

我知道拆分通常是正确的选择，但我很好奇您是如何做到的。这是一个正则表达式问题，而不是我的 Pythonic，无论 Pythonic 有多好。
@user1876508：这不是一个真正的正则表达式问题。你知道如何用正则表达式识别字符串，因为你写了那个。 “如何在 python 中对正则表达式方法的一次调用中拆分字符串”这个问题没有答案；没有这样的api。如果您首先测试了字符串的正确性，则可以使用我的答案中的简单 findall 来拆分它；也许这就是您正在寻找的答案（尽管这是两个电话，而不是一个）。

【解决方案4】：

(?<test>[0-9]+):

这是正则表达式。你需要做的是：例如你的字符串在str中：

var str = "52345:54325432:555:443:3:33";

那么你必须在while循环中用正则表达式匹配这个字符串

while(RegexMatch.Success){

// 这里的操作 }

第一个值：即：52345 将在：

var first = RegexMatch.Groups["test"].Value;

第一次修改。

注意：这不是匹配正则表达式等的确切代码，而是一个伪代码。希望你能理解。我附上图片以显示正则表达式中的组。

【讨论】：