【发布时间】:2017-03-30 17:44:38
【问题描述】:
我正在研究 python 以在匹配字符串之间提取某些字符串。这些字符串是从一个列表中生成的,该列表再次由一个单独的 python 函数动态生成。我正在处理的列表如下所示:-
sample_list = ['line1 this line a first line',
'line1 this line is also considered as line one...',
'line1 this line is the first line',
'line2 this line is second line to be included in output',
'line3 this should also be included in output',
'line1 this contain other strings',
'line1 this may contain other strings as well',
'line2 this line is second line to be included in output',
'line3 this should also be included in output',
'line1 what the heck is it...'
]
我想要的输出是这样的:-
line1 this line is the first line
line2 this line is second line to be included in output
line3 this should also be included in output
line1 this may contain other strings as well
line2 this line is second line to be included in output
line3 this should also be included in output
如您所见,我想提取以 line1 开头并以 line3(直到行尾) 结尾的文本/行。最终输出包括匹配的单词(即 line1 和 line3)。
我试过的代码是:-
# Convert list to string first
list_to_str = '\n'.join(sample_list)
# Get desired output
print(re.findall('\nline1(.*?)\nline2(.*?)\nline3($)', list_to_str, re.DOTALL))
这是我得到的输出 ():-
[]
感谢任何帮助。
编辑1:- 我做了一些工作,找到了这个最接近的解决方案:-
matches = (re.findall(r"^line1(.*)\nline2(.*)\nline3(.*)$", list_to_str, re.MULTILINE))
for match in matches:
print('\n'.join(match))
它给了我这个输出:-
this line is the first line
this line is second line to be included in output
this is the third and it should also be included in output
this may contain other strings as well
this line is second line to be included in output...
this is the third should also be included in output
输出几乎正确,但不包括匹配文本。
【问题讨论】:
-
您应该只遍历列表并检查每个值是否为
.startswith('line1')或'line2'等。 -
正确。但是你不能一次捕获'line1'、'line2'和'line3'。
-
通过 '匹配文本' ,如果你说 findall() 在输出数组中不包括组 0,只需在整个正则表达式
(<your regex>)周围添加一个捕获组@ 示例(^line1(.*)\nline2(.*)\nline3(.*)$)