在正则表达式中提取单词和单词之前并在“_”之间插入答案

【问题标题】：extract word and before word and insert between ”_” in regex在正则表达式中提取单词和单词之前并在“_”之间插入
【发布时间】：2019-07-26 16:51:04
【问题描述】：

我需要一些关于声明正则表达式的帮助。我的输入如下：

我需要在 regex:python 中提取单词和单词之前并在“_”之间插入输入

 Input
 s2 = 'Some other medical terms and stuff diagnosis of R45.2 was entered for  this patient. Where did Doctor Who go? Then xxx feea fdsfd'
 # my regex pattern
 re.sub(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,1}diagnosis", r"\1_", s2)
 Desired Output:
 s2 = 'Some other medical terms and stuff_diagnosis of R45.2 was entered      for this patient. Where did Doctor Who go? Then xxx feea fdsfd'

【问题讨论】：

您没有捕获组，但使用\1 引用它。
试试re.sub(r"[^\w'-]+(?=diagnosis)", "_", s2)，见regex demo。
Wiktor 想说的是必须捕获组，例如：(?: text not captured ) ( text captured as \1 )+ (text captured as \2)?。见stackoverflow.com/questions/36524507/…
非常感谢它的工作
不会比str.replace(' diagnosis ','_diagnosis') 更好吗？

标签： python regex

【解决方案1】：

您的正则表达式中没有定义捕获组，但使用\1 占位符（替换反向引用）来引用它。

您想在 diagnosis 之前替换除 - 和 ' 之外的 1+ 个特殊字符，因此您可以使用

re.sub(r"[^\w'-]+(?=diagnosis)", "_", s2)

见this regex demo。

详情

[^\w'-]+ - 任何非单词字符，不包括 ' 和 _
(?=diagnosis) - 一个不消耗文本的正向前瞻（不添加到匹配值，因此 re.sub 不会删除这段文本），但只需要 diagnosis 文本立即出现在当前的右侧位置。

或者

re.sub(r"[^\w'-]+(diagnosis)", r"_\1", s2)

见this regex demo。在这里，[^\w'-]+ 也匹配那些特殊字符，但 (diagnosis) 是一个 capturing group，其文本可以使用替换模式中的 \1 placeholder 来引用。

注意：如果您想确保 diagnosis 与整个单词匹配，请在其周围使用 \b，\bdiagnosis\b（注意 r 原始字符串文字前缀！） .

【讨论】：