【问题标题】:Regex match text with delimiters in Python 3正则表达式匹配文本与 Python 3 中的分隔符
【发布时间】:2018-02-22 15:07:24
【问题描述】:

我有一个具有这种结构的文本:

Text Starts
23/01/2018
Something here. It was a crazy day.
Believe me.
02/02/2018
Another thing happens.
Some Delimiter
20/02/2017
Text here
21/02/2017
Another text.
Here.
End Section
...text continues...

还有一个正则表达式,用于匹配(日期,文本)组,直到 python 中的Some Delimiter

result = re.findall(r"(\d{2}\/\d{2}\/\d{4}\n)(.*?)(?=\n\d{2}\/\d{2}\/\d{4}|\nSome Delimiter)", text, re.DOTALL)

结果:

>>> print(result)
[('23/01/2018\n', 'Something here. It was a crazy day. \nBelieve me.'),
('02/02/2018\n', 'Another thing happens.'),
('20/02/2017\n', 'Text here')]

它得到分隔符之后的下一个组。

如何获取分隔符之前的所有组?

【问题讨论】:

标签: python regex python-3.x regex-lookarounds


【解决方案1】:
>>> print(text.split('Some Delimiter')[0])
Text Starts
23/01/2018
Something here. It was a crazy day.
Believe me.
02/02/2018
Another thing happens.

>>> re.findall(r"(\d{2}\/\d{2}\/\d{4}\n)(.*?)(?=\n\d{2}\/\d{2}\/\d{4}|$)", text.split('Some Delimiter')[0], re.DOTALL)
[('23/01/2018\n', 'Something here. It was a crazy day.\nBelieve me.'), ('02/02/2018\n', 'Another thing happens.')]
  • text.split('Some Delimiter')[0] 将在分隔符之前给出字符串
  • 然后单独提取这部分的内容

带有regex 模块

>>> import regex
>>> regex.findall(r"(\d{2}\/\d{2}\/\d{4}\n)(.*?)(?=\n(?1)|$)", text.split('Some Delimiter')[0], re.DOTALL)
[('23/01/2018\n', 'Something here. It was a crazy day.\nBelieve me.'), ('02/02/2018\n', 'Another thing happens.')]
  • (?1) 将与第一组正则表达式相同

【讨论】:

  • 你认为只用正则表达式就可以解决它吗?
  • 试试r'(\d{2}/\d{2}/\d{4})\n(.*?)(?=\n(?:\d{2}/\d{2}/\d{4})?.*?Some Delimiter)'
猜你喜欢
  • 1970-01-01
  • 2011-04-11
  • 1970-01-01
  • 2017-05-29
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多