【问题标题】:How can I split a string if a separator is repeated twice?如果分隔符重复两次,如何拆分字符串?
【发布时间】:2022-01-01 15:08:23
【问题描述】:

我需要将字符串 'apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG' 转换为元组列表 [('apple', 'SP'), ('+' , 'SW'), ('orange', 'NNG'), ('+', 'FG), ('melon', 'SL'), ('food', 'JKG')] 我想,首先我需要用分隔符'+'分割一个字符串,然后用分隔符'/'分割。

但问题是有两个加号。第一个加号我需要作为分隔符,第二个我需要保存。如果仅使用分隔符“+”拆分字符串,则会删除所有加号:

s = 'apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG'
x = s.split('+')
print(x)
#['apple/SP', '', '/SW', 'orange/NNG', '', '/FG', 'melon/SL', 'food/JKG']

如果用分隔符'++'分割:

s = 'apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG'
splitted_s = s.plit('++')
print(x)
#['apple/SP', '/SW+orange/NNG', '/FG+melon/SL+food/JKG']

我不知道如何得出 [('apple', 'SP'), ('+', 'SW'), ('orange', 'NNG'), ('+' , 'FG), ('melon', 'SL'), ('food', 'JKG')]

【问题讨论】:

  • 我认为您在第二个代码示例中的意思是 s.split('++')

标签: python list split tuples


【解决方案1】:

你可以使用正则表达式:

  • \+(?=\+) - 加号后跟另一个加号(正向前瞻)
  • | - 或
  • \+(?!/) - 加上后面没有正斜杠(负前瞻)

代码:

import re

pattern = r"\+(?=\+)|\+(?!/)"
string = "apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG"

print([s.split("/") for s in re.split(pattern, string)])

输出:

[['apple', 'SP'], ['+', 'SW'], ['orange', 'NNG'], ['+', 'FG'], ['melon', 'SL'], ['food', 'JKG']]

【讨论】:

    【解决方案2】:

    这是一种解决方案:

    s = 'apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG'
    x = s.replace("++", "+/*")
    x = x.split('+')
    x = [item.replace("*", "+") for item in x]
    x = [item.split('/') for item in x]
    y = []
    for item in x:
        y += item
    #remove the list items that are ''
    for i in range(y.count('')):
        y.remove('')
    # modified from https://stackoverflow.com/questions/53990075/convert-list-into-list-of-tuples-of-every-two-elements
    out = []
    it = iter(y)
    for i in range(len(y)):
        if i % 2 == 0 and i < len(y) - 1:
            out.append((y[i], y[i + 1]))
    print(out)
    

    结果:

    [('apple', 'SP'), ('+', 'SW'), ('orange', 'NNG'), ('+', 'FG'), ('melon', 'SL'), ('food', 'JKG')]
    

    【讨论】:

      【解决方案3】:

      这个答案与 Paul 提出的类似,但我认为我的更简单。

      import re
      
      s = "apple/SP++/SW+orange/NNG++/FG+melon/SL+food/JKG"
      pattern = r"((?:\+|\w+)\/\w+)"
      res = [tuple(m.split("/")) for m in re.findall(pattern, s)]
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2020-10-03
        • 2020-04-11
        • 2018-01-06
        • 2012-02-14
        • 2020-01-07
        • 2011-08-04
        • 1970-01-01
        相关资源
        最近更新 更多