python re.split 不适用于所有字段答案

【问题标题】：python re.split not working for all fieldspython re.split 不适用于所有字段
【发布时间】：2017-09-26 19:14:05
【问题描述】：

import re
string = "some text \n\n\nError on the field: more\n text and lines\n\n\nError on the field: some more\n lines \n\n\nError on the field: final lines"
pieces = re.split(r'(Error on the field:)', string, re.IGNORECASE)
pieces
['some text \n\n\n', 'Error on the field:', ' more\n text and lines\n\n\n', 'Error on the field:', ' some more\n lines \n\n\nError on the field: final lines']
pieces2 = re.split(r'(Error on the field:)', pieces[4], re.IGNORECASE)
pieces2
[' some more\n lines \n\n\n', 'Error on the field:', ' final lines']

为什么'Error on the field:'的第三个split在pieces的初始split中没有被拾取，而在pieces[4]的拆分时却被拾取？

【问题讨论】：

只要使用re.split(r'(?i)(Error on the field:)', string)

标签： python regex split

【解决方案1】：

re.split 的位置参数是：

正则表达式
字符串
maxsplit（默认值：无限制）
标志（默认值：无标志）

split(pattern, string, maxsplit=0, flags=0)

您将re.IGNORECASE（标志的值是2）作为maxsplit 参数（作为位置）传递，这解释了奇怪的效果。它工作到某个点，然后在 2 次拆分后按照指示停止拆分。

只需改用flags=re.IGNORECASE（关键字，而不是位置）即可。

在re.compile 中，您可以安全地将标志作为位置传递：compile(pattern, flags=0)，re.match 和re.search 也是如此，但re.split 和re.sub 则不然，所以这是一个简单的陷阱陷入。如有疑问，请始终对可选参数使用传递关键字。

【讨论】：

【解决方案2】：

在使用re.split 时，您需要通过使用flags= 来明确使用标志：

import re
string = "some text \n\n\nError on the field: more\n text and lines\n\n\nError on the field: some more\n lines \n\n\nError on the field: final lines"
pieces = re.split(r'(Error on the field:)', string, flags=re.I)

print(pieces)

输出：

['some text \n\n\n', 'Error on the field:', ' more\n text and lines\n\n\n', 'Error on the field:', ' some more\n lines \n\n\n', 'Error on the field:', ' final lines']

注意 re.I 与 re.IGNORECASE 相同

【讨论】：