空字符串的正则表达式答案

【问题标题】：regex for blank string空字符串的正则表达式
【发布时间】：2021-12-22 02:41:53
【问题描述】：

我有一个字符串：

s=

"(2021-06-29T10:53:42.647Z) [Denis]: hi
(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING
(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane 
(2021-06-29T11:58:29.053Z) [Nicholas]: 
(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#
(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021
(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##"

我想从中提取文本。预期输出为：

comments=['hi','TA FOR SHOWING','how are you bane',' ','#END_REMOTE#','VAL 01JUL2021','##ENDED AT 08:07 GMT##']

我试过的是：

comments=re.findall(r']:\s+(.*?)\n',s)

正则表达式运行良好，但我无法将空白文本设为''

【问题讨论】：

你必须排除匹配 ] 像 ]:\s+([^]\n]*)$
能否提供您用于处理文本的代码？您提供的字符串文字 does not compile.
我注意到你没有接受任何your questions 的问题你能复习一下问题吗，如果发布的答案成功了，请看What should I do when someone answers my question?
@Thefourthbird 我做过...肯定会为其他人做的。

标签： python-3.x regex string

【解决方案1】：

您可以排除匹配 ] 而不是在捕获组中，如果您还想匹配最后一行的值，您可以断言字符串的结尾 $ 而不是匹配强制换行符 @ 987654326@

注意\s可以匹配换行符，否定字符类[^]]*可以匹配换行符

]:\s+([^]]*)$

Regex demo | Python demo

import re

regex = r"]:\s+([^]]*)$"

s = ("(2021-06-29T10:53:42.647Z) [Denis]: hi\n"
    "(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING\n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane \n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: \n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#\n"
    "(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021\n"
    "(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##")

print(re.findall(regex, s, re.MULTILINE))

输出

['hi', 'TA FOR SHOWING', 'how are you bane ', '', '#END_REMOTE#', 'VAL 01JUL2021', '##ENDED AT 08:07 GMT##']

如果你不想跨界：

]:[^\S\n]+([^]\n]*)$

Regex demo

【讨论】：

【解决方案2】：

您可以将冒号后的所有内容识别为捕获组 1 中的数组。

re.findall(r'(?m):[ \t]+(.*?)[ \t]*$',s)

然后循环数组，为所有空元素分配一个空格。

>>> import re
>>>
>>> s= """
... (2021-06-29T10:53:42.647Z) [Denis]: hi
... (2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING
... (2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane
... (2021-06-29T11:58:29.053Z) [Nicholas]:
... (2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#
... (2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021
... (2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##
... """
>>>
>>> talk = [re.sub('^$', ' ', w) for w in re.findall(r'(?m):[ \t]+(.*?)[ \t]*$',s)]
>>> print(talk)
['hi', 'TA FOR SHOWING', 'how are you bane', ' ', '#END_REMOTE#', 'VAL 01JUL2021', '##ENDED AT 08:07 GMT##']

【讨论】：

【解决方案3】：

这是你想要的吗？

comments = re.findall(r']:\s(.*?)\n',s)

如果: 后面的空格总是一个空格，那么\s+ 应该是\s。 \s+ 表示一个或多个空格。

【讨论】：

【解决方案4】：

使用您显示的示例，请尝试以下正则表达式。

^\(\d{4}-\d{2}-\d{2}T(?:\d{2}:){2}\d{2}\.\d{3}Z\)\s+\[[^]]*\]:\s+([^)]*)$

Online demo for above regex

说明：为上述添加详细说明。

^\(\d{4}-\d{2}-\d{2}  ##Matching from starting of line ( followed by 4 digits-2 digits- 2 digits here.
T(?:\d{2}:){2}        ##Matching T followed by a non-capturing group which is matching 2 digits followed by colon 2 times.
\d{2}\.\d{3}Z\)\s+    ##Matching 2 digits followed by dot followed by 3 digits Z and ) followed by space(s).
\[[^]]*\]:\s+         ##Matching literal [ till first occurrence of ] followed by ] colon and space(s).
([^)]*)$              ##Creating 1st capturing group which has everything till next occurrence of `)`.

使用 Python3x：

import re
regex = r"^\(\d{4}-\d{2}-\d{2}T(?:\d{2}:){2}\d{2}\.\d{3}Z\)\s+\[[^]]*\]:\s+([^)]*)$"
varVal = ("(2021-06-29T10:53:42.647Z) [Denis]: hi\n"
    "(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING\n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane \n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: \n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#\n"
    "(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021\n"
    "(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##")

print(re.findall(regex, varVal, re.MULTILINE))

OP 显示的示例输出如下：

['hi', 'TA FOR SHOWING', 'how are you bane ', '', '#END_REMOTE#', 'VAL 01JUL2021', '##ENDED AT 08:07 GMT##']

【讨论】：