如何在正则表达式匹配后使用正则表达式删除特定文本部分答案

【问题标题】：How to use regex to delete a specific section of text after a regex match如何在正则表达式匹配后使用正则表达式删除特定文本部分
【发布时间】：2023-03-23 21:46:01
【问题描述】：

（不，Python regex, how to delete all matches from a string 不能解决我的问题）

假设我有这个列表：

names = ['your name', 'the name', 'his name', 'her name', 'their name', 'employer name', "employer's name", "father's name",
        "mother's name", "maiden name", "son's name", "daughter's name", "brother's name", "sister's name"]

假设我有这段文字：

text = "What is your name?  Well,  uh it's John Smith.  Thanks for asking. Anyway, I'd doing well."

如何使用正则表达式在文本中查找列表名称的每个元素，并将元素之后的文本块（例如，长度为 50）替换为“[name]”。所以我的输出是：

text = "What is your name [name] Anyway, I'd doing well."

到目前为止，我在下面有这段代码，但它只用“[name]”替换元素，而不是元素后面的实际文本。

def my_replace3(match):
    match = match.group()
    return " [name] "

def no_name(text):
    names = ['your name', 'the name', 'his name', 'her name', 'their name', 'employer name', "employer's name", "father's name",
        "mother's name", "maiden name", "son's name", "daughter's name", "brother's name", "sister's name"]
    regex = re.compile(r'\b(' + '|'.join(names) + r')\b', re.IGNORECASE)
    text = re.sub(regex, my_replace3, text)
    return text

我不是一个出色的正则表达式专家，因此非常感谢您的帮助。

【问题讨论】：

哦...哎呀，为什么会有反对票？我做错什么了吗？我尝试在 stackoverflow 上寻找解决方案，但找不到。
Python regex, how to delete all matches from a string的可能重复
使用regex = re.compile(r'\b(?:' + '|'.join(names) + r')\b.{0,50}', re.I | re.S)，然后使用text = re.sub(regex, '[name] ', text)，参见ideone.com/hyQxfM
@SudharsanaRajasekaran 他不想从字符串中删除匹配项，他想在匹配项之后删除文本。

标签： python regex python-3.x

【解决方案1】：

如果要在匹配后替换 50 个字符，请在正则表达式中添加 .{50}。

然后在替换字符串中使用反向引用将匹配的字符串复制到替换。

def no_name(text):
    names = ['your name', 'the name', 'his name', 'her name', 'their name', 'employer name', "employer's name", "father's name",
        "mother's name", "maiden name", "son's name", "daughter's name", "brother's name", "sister's name"]
    regex = re.compile(r'\b(' + '|'.join(map(re.escape, names)) + r')\b.{50}', re.IGNORECASE)
    text = re.sub(regex, r'\1 [name]', text)
    return text

在将应该完全匹配的字符串插入到正则表达式中时，您还应该使用re.escape()，以防其中任何一个包含正则表达式运算符。

【讨论】：

re.escape 在\b 字边界和OP 的names 的上下文中并不那么重要。如果搜索词的开头/结尾有特殊字符，使用re.escape是不够的。
{0,50} 表示最大为 50，{50} 表示正好为 50。如果匹配后的字符数少于 50，则应使用 {0,50}。
@WiktorStribiżew 好点，但在这种情况下，可能只是中间的特殊字符会成为问题。