正则表达式 - 在符号后查找数字答案

【问题标题】：regular expression - find numbers after symbol正则表达式 - 在符号后查找数字
【发布时间】：2020-09-08 05:53:08
【问题描述】：

我正在尝试使用下面的正则表达式来查找“|”之后的任何数字运算符来处理下面的一些示例字符串。问题在于默认的正则表达式，我似乎无法将 numeric_regex 与 Lookbehind 结合起来。

'xxx -> 31223.1 | xxx -> 1.1'.    to get 1.1

'0 | 1'     to get 1

numeric_regex = ''' 
                [-+]?                    # pos or neg
                (?: (?: \d* \. \d+ ) |   # float (ie .1 and 1.1)
                (?: \d+ \.? ) )          # int (with trailing periods ie 1.)
            '''

default_regex = f'''
                (? <= \|).               # after but not including |
                {numeric_regex}          # all digits
                + $                      # end of the string
            '''

任何帮助表示赞赏！

【问题讨论】：

嗨，Tommy，这两个答案对您解决问题有帮助吗？

标签： python regex

【解决方案1】：

以下是你的问题陈述的一个小程序

import re

regex = r"\|.*?[\-\+]?(\d+\.\d+|\d+\.?|\.\d+)"

test_str = ("xxx -> 31223.1 | xxx -> 1.1\n"
    "0|1\n"
    "0|abc 1.\n"
    "0|.1\n")

matches = re.finditer(regex, test_str)

for matchNum, match in enumerate(matches, start=1):
    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1
        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

你正在寻找的正则表达式是

正则表达式 = r"\|.*?[-+]?(\d+.\d+|\d+.?|.\d+)"

只测试了几个场景，希望对你有所帮助。

【讨论】：

谢谢，这似乎在某种程度上可行。我可以问我的尝试有什么问题吗？我希望能够从 re.complie(default_regex).group() 返回数字。另外，如果可能的话，您认为这可以使用格式化字符串来完成吗？我有其他正则表达式需要使用 numeric_regex 作为另一个正则表达式的一部分。

【解决方案2】：

您的主要问题是，即使使用 re.X 或 re.VERBOSE 标志，您也会在不应该出现的地方引入空格。您不能将构成后视构造的字符分开。您还应该使用量化的子模式保留量词。

此外，您不需要在此处进行查看，只需使用捕获组捕获您的号码，然后使用match.group(1) 访问它。

参见full Python demo 和regex demo：

import re
numeric_regex = r'''
                [-+]?                    # pos or neg
                (?:
                  \d*\.\d+               # float (ie .1 and 1.1)
                  | 
                  \d+ \.?                # int (with trailing periods ie 1.)
                )'''

default_regex = rf'''
                .*                       # Match as many chars as possible (use with re.S)
                \|.*?                    #  | and 0+ chars as few as possible
                ({numeric_regex})        # Capturing group: all digits
                $                        # end of the string
'''
m = re.search(default_regex, "xxx -> 31223.1 | xxx 1.1", re.S | re.VERBOSE)
if m:
    print(m.group(1)) # => 1.1

注意default_regex 中的(...)。在我们可能想要重用的numeric_regex 中，使用了一个非捕获组(?:...)，因为我们只需要在这里对两个备选方案进行分组。

现在主要的正则表达式方案是.*\|.*?({numeric_regex})$，即匹配|，除换行符以外的0个或多个字符尽量少，然后将数字部分捕获到Group 1，然后$断言字符串末尾的位置。由于第一个.*，您将获得最右边的|（和后续模式）匹配。

【讨论】：

感谢您的回答。我认为这不适用于给出的情况，“xxx -> 31223.1 | xxx 1.1”？
@Tommy For xxx，如果需要匹配任何文字，就用.*?，我更新了答案。
太好了，谢谢。只是为了澄清，为什么我们需要按组提取？
@Tommy 我们需要一个组，因为它之前的模式可以是可变长度的，而 Python re 不支持后视中的未知长度模式。
有道理。还有一种情况我似乎无法弄清楚，当我们有多个“|”运算符时。这将如何处理？我忘了说，它必须找到最后一个运算符。再次感谢'30 -> 2 | 4 -> 3.2 |0 '