【问题标题】:Split string by first substring found按找到的第一个子字符串拆分字符串
【发布时间】:2019-07-06 18:55:39
【问题描述】:

我希望在这些词第一次出现时用某些词分割一个句子。让我举例说明:

message = 'I wish to check my python code for errors to run the program properly with fluency'

我希望通过for/to/with 的第一次出现来拆分上述消息,因此上述消息的结果将是check my python code for errors to run the program properly with fluency

我还希望包含用于拆分句子的单词,因此我的最终结果将是: to check my python code for errors to run the program properly with fluency

我的代码不起作用:

import re
message = 'I wish to check my python code for errors to run the program properly with fluency'
result = message.split(r"for|to|with",1)[1]
print(result)

我能做什么?

【问题讨论】:

    标签: python regex


    【解决方案1】:

    split 不采用正则表达式作为参数(也许您正在考虑 Perl)。

    以下是你想要的:

    import re
    message = 'I wish to check my python code for errors to run the program properly with fluency'
    result = re.search(r'\b(for|to|with)\b', message)
    print message[result.start(1):]
    

    这不使用替换、重新加入或循环,而只是简单地搜索所需的字符串并使用其位置结果。

    【讨论】:

      【解决方案2】:
      message = 'I wish to check my python code for errors to run the program properly with fluency'
      array = message.split(' ')
      number = 0
      message_new = ''
      for i in range(len(array)):
          if array[i] == 'to' or array[i] == 'for':
              number=i
              break
      for j in range(number,len(array)):
          message_new += array[j] + ' '
      print(message_new) 
      

      输出:

      to check my python code for errors to run the program properly with fluency 
      

      【讨论】:

        【解决方案3】:

        我的猜测是,这个简单的表达式可能会简单地做到这一点

        .*?(\b(?:to|for|with)\b.*)
        

        re.match 可能是这五种方法中最快的一种:

        re.findall测试

        import re
        
        regex = r".*?(\b(?:to|for|with)\b.*)"
        test_str = "I wish to check my python code for errors to run the program properly with fluency"
        print(re.findall(regex, test_str))
        

        re.sub测试

        import re
        
        regex = r".*?(\b(?:to|for|with)\b.*)"
        test_str = "I wish to check my python code for errors to run the program properly with fluency"
        subst = "\\1"
        
        result = re.sub(regex, subst, test_str)
        
        if result:
            print (result)
        

        re.finditer测试

        import re
        
        regex = r".*?(\b(?:to|for|with)\b.*)"
        
        test_str = "I wish to check my python code for errors to run the program properly with fluency"
        
        matches = re.finditer(regex, test_str, re.MULTILINE)
        
        for matchNum, match in enumerate(matches, start=1):
        
            # FULL MATCH
            print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
        
            for groupNum in range(0, len(match.groups())):
                groupNum = groupNum + 1
        
                print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
        

        re.match测试

        import re
        
        regex = r".*?(\b(?:to|for|with)\b.*)"
        test_str = "I wish to check my python code for errors to run the program properly with fluency"
        
        print(re.match(regex, test_str).group(1))
        

        re.search测试

        import re
        
        regex = r".*?(\b(?:to|for|with)\b.*)"
        test_str = "I wish to check my python code for errors to run the program properly with fluency"
        
        print(re.search(regex, test_str).group(1))
        

        表达式在this demo 的右上角进行了解释,如果您想进一步探索或修改它,如果您愿意,可以在this link 中查看它如何与一些示例输入匹配。

        【讨论】:

        • 像这样替换整个字符串效率低下 - 在我的计算机上,看起来比使用 search() 查找第一个结果要慢三倍。
        【解决方案4】:

        该问题已在以下位置得到回答:how to remove all characters before a specific character in python 但它只适用于一个特定的分隔符,对于多个分隔符,您首先必须找出哪个首先出现,可以在这里找到:how can i find the first occurrence of a substring in a python string 你从第一个猜测开始,我没有太多想象力所以我们称它为 bestDelimiter = firstDelimiter,找出它第一次出现的位置,将位置保存到 bestPosition = 第一次出现的位置,继续找出位置其余的定界符,每次您找到一个出现在当前 bestPosition 之前的定界符时,您都会更新变量 bestDelimiter 和 bestPosition,最后首先出现的定界符将是 bestDelimiter,然后使用 bestDelimiter

        【讨论】:

          【解决方案5】:

          您可以先找到fortowith的所有实例,按所需值拆分,然后拼接并重新加入:

          import re
          message = 'I wish to check my python code for errors to run the program properly with fluency'
          vals, [_, *s] = re.findall(r"\bfor\b|\bto\b|\bwith\b", message), re.split(r"\bfor\b|\bto\b|\bwith\b", message)
          result = ''.join('{} {}'.format(a, re.sub("^\s+", "", b)) for a, b in zip(vals, s))
          

          输出:

          'to check my python code for errors to run the program properly with fluency'
          

          【讨论】:

            猜你喜欢
            • 2015-07-25
            • 1970-01-01
            • 2015-12-18
            • 2020-11-10
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2020-02-14
            • 2014-10-12
            相关资源
            最近更新 更多