【问题标题】:Split string without losing delimiter (and its count)拆分字符串而不丢失分隔符(及其计数)
【发布时间】:2021-08-22 09:13:19
【问题描述】:

我正在尝试在空格上拆分如下字符串:

string = "This            is a                      test."

# desired output
# ['This', '            ', 'is', ' ', 'a', '                      ', 'test.']

# actual output, which does make sense
result = string.split()
# ['This', 'is', 'a', 'test.']

还有 re.split 保留分隔符,但不是我希望的方式:

import re
string = "This            is a                      test."

result = re.split(r"( )", string)
# ['This',
# ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ',
# 'is', ' ',
# 'a', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ', '', ' ',
# 'test.']

我可以这样做并达到我想要的结果:

string = "This            is a                      test."
result = []

spaces = ''
word = ''
for letter in string:
    if letter == ' ':
        spaces += ' ' 
        if word:
            result.append(word)
            word = ''
    else:
        word += letter
        if spaces:
            result.append(spaces)
            spaces = ''
if spaces:
    result.append(spaces)
if word:
    result.append(word)

print(result)
# ['This', '            ', 'is', ' ', 'a', '                      ', 'test.']

但这感觉不是最好的方法。有没有更 Pythonic 的方式来实现这一点?

【问题讨论】:

    标签: python string split


    【解决方案1】:

    尝试使用re.split(\s+) 的表达式:

    >>> import re
    >>> string = "This            is a                      test."
    >>> re.split(r'(\s+)', string)
    ['This', '            ', 'is', ' ', 'a', '                      ', 'test.']
    >>> 
    

    Regex101 example.

    【讨论】:

    • 将其设为 r'(\s+)' 以避免出现 DeprecationWarning: invalid escape sequence \s
    【解决方案2】:

    你可以试试这个:

    import re
    def main_function(string):
        return re.split(r'(\s+)', string)
    
    print(main_function("This            is a                      test."))
    

    输出:

    ['This', '            ', 'is', ' ', 'a', '                      ', 'test.']
    

    Example on Regex101.com

    【讨论】:

      【解决方案3】:

      您也可以避免进行字符串拆分,而是使用re.findall

      string = "This            is a                      test."
      matches = re.findall(r'\s+|\S+', string)
      print(matches)
      

      打印出来:

      ['This', '            ', 'is', ' ', 'a', '                      ', 'test.']
      

      正则表达式替换 \s+|\S+ 交替匹配空白或非空白字符组。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2021-07-06
        • 2011-10-13
        • 2017-05-29
        • 1970-01-01
        • 1970-01-01
        • 2023-04-02
        • 1970-01-01
        相关资源
        最近更新 更多