Python 无法识别有效的 RegEx 条目答案

【问题标题】：Python does not recognise valid RegEx entryPython 无法识别有效的 RegEx 条目
【发布时间】：2020-10-08 13:47:09
【问题描述】：

我正在尝试编写一个 Python 脚本来解析一个日志文件。脚本核心借自pythonic ways。

import re
log_file_path = r"O:\ZTK log file parser\2 Parsing Log\JP"
regex = '8355371640847825590'
match_list = []

with open(log_file_path, "r") as file:
    for line in file:
        for match in re.finditer(regex, line, re.S):
            match_text = match.group()
            match_list.append(match_text)
print(match_list)  # work in progress

上面的例子在解析纯字符串值时效果很好。但是当我尝试插入正则表达式变量时：

regex = '((.*\n){2}).*8355371640847825590'

它总是返回一个空列表。

困扰我的是，这个表达式在测试环境中运行得非常好，比如https://regex101.com/。每个值都正确匹配。不幸的是，我无法在 Python 中复制它。

如果您能帮助我，我将不胜感激。

【问题讨论】：

当然，你逐行阅读。使用file.read()阅读整个文件
@WiktorStribiżew 这很重要！但是，我的代码现在吐出了整个文件： log_file_path = r"O:\ZTK log file parser\2 Parsing Log\JP" regex = '((.*\n){2}).*8355371640847825590' match_list = [ ] f = open(log_file_path, "r") with open(log_file_path, "r") as file: for line in file: for match in re.finditer(regex, f.read(), re.S): match_text = match.group() match_list.append(match_text)。它仍然不匹配正确的正则表达式:(

标签： python regex parsing logging

【解决方案1】：

如果您希望您的模式在换行符之间匹配，您需要将整个文件读入一个变量。此外，您可以明确让正则表达式引擎知道您只需要从行首开始匹配，

(?m)^(?:.*\n){2}.*8355371640847825590

请参阅regex demo。

详情

(?m) - （内联 re.M / re.MULTILINE 修饰符）^ 现在将匹配行首位置
^ - 行首
(?:.*\n){2} - 两行带换行符
.*8355371640847825590 - 除了换行符之外的任何 0 个或更多字符，然后是 8355371640847825590

Python 演示：

import re
log_file_path = r"O:\ZTK log file parser\2 Parsing Log\JP"
regex = '(?m)^(?:(?:.*\n){2}).*8355371640847825590'
match_list = []

with open(log_file_path, "r") as file:
    match_list = re.findall(regex, file.read())

print(match_list)

【讨论】：

谢谢。你帮了我们很多。