【问题标题】:regex for blank string空字符串的正则表达式
【发布时间】:2021-12-22 02:41:53
【问题描述】:

我有一个字符串:

s=

"(2021-06-29T10:53:42.647Z) [Denis]: hi
(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING
(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane 
(2021-06-29T11:58:29.053Z) [Nicholas]: 
(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#
(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021
(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##"

我想从中提取文本。预期输出为:

comments=['hi','TA FOR SHOWING','how are you bane',' ','#END_REMOTE#','VAL 01JUL2021','##ENDED AT 08:07 GMT##'] 

我试过的是:

comments=re.findall(r']:\s+(.*?)\n',s) 

正则表达式运行良好,但我无法将空白文本设为''

【问题讨论】:

标签: python-3.x regex string


【解决方案1】:

您可以排除匹配 ] 而不是在捕获组中,如果您还想匹配最后一行的值,您可以断言字符串的结尾 $ 而不是匹配强制换行符 @ 987654326@

注意\s可以匹配换行符,否定字符类[^]]*可以匹配换行符

]:\s+([^]]*)$

Regex demo | Python demo

import re

regex = r"]:\s+([^]]*)$"

s = ("(2021-06-29T10:53:42.647Z) [Denis]: hi\n"
    "(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING\n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane \n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: \n"
    "(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#\n"
    "(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021\n"
    "(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##")

print(re.findall(regex, s, re.MULTILINE))

输出

['hi', 'TA FOR SHOWING', 'how are you bane ', '', '#END_REMOTE#', 'VAL 01JUL2021', '##ENDED AT 08:07 GMT##'] 

如果你不想跨界:

]:[^\S\n]+([^]\n]*)$

Regex demo

【讨论】:

    【解决方案2】:

    您可以将冒号后的所有内容识别为捕获组 1 中的数组。

    re.findall(r'(?m):[ \t]+(.*?)[ \t]*$',s) 
    

    然后循环数组,为所有空元素分配一个空格。

    >>> import re
    >>>
    >>> s= """
    ... (2021-06-29T10:53:42.647Z) [Denis]: hi
    ... (2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING
    ... (2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane
    ... (2021-06-29T11:58:29.053Z) [Nicholas]:
    ... (2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#
    ... (2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021
    ... (2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##
    ... """
    >>>
    >>> talk = [re.sub('^$', ' ', w) for w in re.findall(r'(?m):[ \t]+(.*?)[ \t]*$',s)]
    >>> print(talk)
    ['hi', 'TA FOR SHOWING', 'how are you bane', ' ', '#END_REMOTE#', 'VAL 01JUL2021', '##ENDED AT 08:07 GMT##']
    

    【讨论】:

      【解决方案3】:

      这是你想要的吗?

      comments = re.findall(r']:\s(.*?)\n',s)
      

      如果: 后面的空格总是一个空格,那么\s+ 应该是\s\s+ 表示一个或多个空格。

      【讨论】:

        【解决方案4】:

        使用您显示的示例,请尝试以下正则表达式。

        ^\(\d{4}-\d{2}-\d{2}T(?:\d{2}:){2}\d{2}\.\d{3}Z\)\s+\[[^]]*\]:\s+([^)]*)$
        

        Online demo for above regex

        说明:为上述添加详细说明。

        ^\(\d{4}-\d{2}-\d{2}  ##Matching from starting of line ( followed by 4 digits-2 digits- 2 digits here.
        T(?:\d{2}:){2}        ##Matching T followed by a non-capturing group which is matching 2 digits followed by colon 2 times.
        \d{2}\.\d{3}Z\)\s+    ##Matching 2 digits followed by dot followed by 3 digits Z and ) followed by space(s).
        \[[^]]*\]:\s+         ##Matching literal [ till first occurrence of ] followed by ] colon and space(s).
        ([^)]*)$              ##Creating 1st capturing group which has everything till next occurrence of `)`.
        

        使用 Python3x:

        import re
        regex = r"^\(\d{4}-\d{2}-\d{2}T(?:\d{2}:){2}\d{2}\.\d{3}Z\)\s+\[[^]]*\]:\s+([^)]*)$"
        varVal = ("(2021-06-29T10:53:42.647Z) [Denis]: hi\n"
            "(2021-06-29T10:54:53.693Z) [Nicholas]: TA FOR SHOWING\n"
            "(2021-06-29T11:58:29.053Z) [Nicholas]: how are you bane \n"
            "(2021-06-29T11:58:29.053Z) [Nicholas]: \n"
            "(2021-06-29T11:58:29.053Z) [Nicholas]: #END_REMOTE#\n"
            "(2021-06-30T08:07:42.029Z) [Denis]: VAL 01JUL2021\n"
            "(2021-06-30T08:07:42.029Z) [Denis]: ##ENDED AT 08:07 GMT##")
        
        print(re.findall(regex, varVal, re.MULTILINE))
        

        OP 显示的示例输出如下:

        ['hi', 'TA FOR SHOWING', 'how are you bane ', '', '#END_REMOTE#', 'VAL 01JUL2021', '##ENDED AT 08:07 GMT##']
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2013-10-07
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2018-07-23
          • 2013-02-15
          • 2020-10-26
          • 1970-01-01
          相关资源
          最近更新 更多