按找到的第一个子字符串拆分字符串答案

【问题标题】：Split string by first substring found按找到的第一个子字符串拆分字符串
【发布时间】：2019-07-06 18:55:39
【问题描述】：

我希望在这些词第一次出现时用某些词分割一个句子。让我举例说明：

message = 'I wish to check my python code for errors to run the program properly with fluency'

我希望通过for/to/with 的第一次出现来拆分上述消息，因此上述消息的结果将是check my python code for errors to run the program properly with fluency

我还希望包含用于拆分句子的单词，因此我的最终结果将是： to check my python code for errors to run the program properly with fluency

我的代码不起作用：

import re
message = 'I wish to check my python code for errors to run the program properly with fluency'
result = message.split(r"for|to|with",1)[1]
print(result)

我能做什么？

【问题讨论】：

标签： python regex

【解决方案1】：

split 不采用正则表达式作为参数（也许您正在考虑 Perl）。

以下是你想要的：

import re
message = 'I wish to check my python code for errors to run the program properly with fluency'
result = re.search(r'\b(for|to|with)\b', message)
print message[result.start(1):]

这不使用替换、重新加入或循环，而只是简单地搜索所需的字符串并使用其位置结果。

【讨论】：

【解决方案2】：

message = 'I wish to check my python code for errors to run the program properly with fluency'
array = message.split(' ')
number = 0
message_new = ''
for i in range(len(array)):
    if array[i] == 'to' or array[i] == 'for':
        number=i
        break
for j in range(number,len(array)):
    message_new += array[j] + ' '
print(message_new)

输出：

to check my python code for errors to run the program properly with fluency

【讨论】：

【解决方案3】：

我的猜测是，这个简单的表达式可能会简单地做到这一点

.*?(\b(?:to|for|with)\b.*)

而re.match 可能是这五种方法中最快的一种：

用`re.findall`测试

import re

regex = r".*?(\b(?:to|for|with)\b.*)"
test_str = "I wish to check my python code for errors to run the program properly with fluency"
print(re.findall(regex, test_str))

用`re.sub`测试

import re

regex = r".*?(\b(?:to|for|with)\b.*)"
test_str = "I wish to check my python code for errors to run the program properly with fluency"
subst = "\\1"

result = re.sub(regex, subst, test_str)

if result:
    print (result)

用`re.finditer`测试

import re

regex = r".*?(\b(?:to|for|with)\b.*)"

test_str = "I wish to check my python code for errors to run the program properly with fluency"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    # FULL MATCH
    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

用`re.match`测试

import re

regex = r".*?(\b(?:to|for|with)\b.*)"
test_str = "I wish to check my python code for errors to run the program properly with fluency"

print(re.match(regex, test_str).group(1))

用`re.search`测试

import re

regex = r".*?(\b(?:to|for|with)\b.*)"
test_str = "I wish to check my python code for errors to run the program properly with fluency"

print(re.search(regex, test_str).group(1))

表达式在this demo 的右上角进行了解释，如果您想进一步探索或修改它，如果您愿意，可以在this link 中查看它如何与一些示例输入匹配。

【讨论】：

像这样替换整个字符串效率低下 - 在我的计算机上，看起来比使用 search() 查找第一个结果要慢三倍。

【解决方案4】：

该问题已在以下位置得到回答：how to remove all characters before a specific character in python 但它只适用于一个特定的分隔符，对于多个分隔符，您首先必须找出哪个首先出现，可以在这里找到：how can i find the first occurrence of a substring in a python string 你从第一个猜测开始，我没有太多想象力所以我们称它为 bestDelimiter = firstDelimiter，找出它第一次出现的位置，将位置保存到 bestPosition = 第一次出现的位置，继续找出位置其余的定界符，每次您找到一个出现在当前 bestPosition 之前的定界符时，您都会更新变量 bestDelimiter 和 bestPosition，最后首先出现的定界符将是 bestDelimiter，然后使用 bestDelimiter

【讨论】：

【解决方案5】：

您可以先找到for、to和with的所有实例，按所需值拆分，然后拼接并重新加入：

import re
message = 'I wish to check my python code for errors to run the program properly with fluency'
vals, [_, *s] = re.findall(r"\bfor\b|\bto\b|\bwith\b", message), re.split(r"\bfor\b|\bto\b|\bwith\b", message)
result = ''.join('{} {}'.format(a, re.sub("^\s+", "", b)) for a, b in zip(vals, s))

输出：

'to check my python code for errors to run the program properly with fluency'

【讨论】：

用re.findall测试

用re.sub测试

用re.finditer测试

用re.match测试

用re.search测试

用`re.findall`测试

用`re.sub`测试

用`re.finditer`测试

用`re.match`测试

用`re.search`测试