替换重复的单词，不区分大小写[重复]答案

【问题标题】：Replacing repetitive words, case-insensitively [duplicate]替换重复的单词，不区分大小写[重复]
【发布时间】：2016-10-29 12:19:57
【问题描述】：

示例：

>>> line = 'the the, To to'
>>> re.findall(r'\b(\w+) \1', line)
['the']
>>> re.findall(r'\b(\w+) \1', line, re.I)
['the', 'To']

>>> re.sub(r'\b(\w+) \1', r'\1', line, re.I)
'the, To to'

预期：

'the, To'

正则表达式适用于其他地方，例如

Vim：s/\v<(\w+) \1/\1/gi
Perl：s/\b(\w+) \1/$1/gi
sed：-r 's/\b(\w+) \1/\1/gi'

这是一种已知的行为吗？什么是解决方法？我的 Python 版本是 3.4.3，如果这有影响的话。

【问题讨论】：

这不是一个真正的perl 问题，是吗？但核心区别在于 - 您的模式中似乎没有 gi 修饰符。
@Sobrique OP 有re.I，忽略大小写标志，re.sub 默认为“全局”；但是，他们没有正确传递标志。

标签： python regex python-3.x

【解决方案1】：

阅读the definition of re.sub：

re.sub(pattern, repl, string, count=0, flags=0)

您将re.I 传递为count（它最多允许2 替换），而不是flags。相反，请尝试：

>>> re.sub(r'\b(\w+) \1', r'\1', s, flags=re.I)
                                  # ^ note
'the, To'

【讨论】：

@jonrsharpe 真的需要添加单词边界吗？
@ShekharKhairnar 我没有添加它，只是从问题中复制了 OP 的正则表达式。在这种情况下，这无关紧要，但我认为他们正在处理可能相关的其他情况。
是的，在需要的情况下，我需要避免替换 dot the 之类的情况 - 如果没有 \b，它将更改 t t