正则表达式模式删除任何东西 \text {whatever here}答案

【问题标题】：Regex pattern to remove anything \text {whatever here}正则表达式模式删除任何东西 \text {whatever here}
【发布时间】：2025-12-28 00:30:07
【问题描述】：

我可以用\{\%(.*?)\%\}把hell0 {% my text %}改成hello

或 将useful ()useful 更改为useful useful

我的问题是我想删除final \text { whatever here } result 中的任何内容，包括\text。所以它变成了final result。

我尝试了与r"\\text .*?/ }"相同的方法，但没有奏效。

我有一个代码，它是清理我的数据的类的一部分：

def get_features(self,s:str)->list:
        '''
        Produce Shingles or n-Grams of CHARACTERS in a given string.
        args:
            s: Given String
        out: Shingle os a string. If a string is 'how are you' then the returned list is ['how','owa','war','are','rey','eyo','you',] with width = 3
        '''
        assert self.args_flag, "pass in the arguments for preprocessing by calling set_preprocess_params()"
        
        if self.lower:
            s = s.lower()
            
        if self.ascii_only:
            s = re.sub(r"[^\x00-\x7F]",'',s)

        if self.remove_special: # Remove special characters
            s = re.sub(r'[^\w ]+', '', s)
    
        s = re.sub(r'[_ \\]', '', s) # Remove Empty spaces and _ as they are not covered in special chars. Also, I want to remove any "backslashes \"
        return s

【问题讨论】：

re.sub(r'\s*\\text\s*{[^{}]*}', '', s) 是否有助于解决问题？
re.sub('(?<=final )(.*)(?=result)', '', 'final \text { whatever here } result')?
能否请您查看以下答案并提供反馈？

标签： python regex re

【解决方案1】：

试试下面的正则表达式：

s = re.sub(r" \text .*}", '', s)

几个例子：

>>> strg = "final \text {de2mflkmlkfm;rl;erv,;g ,;lgmkl4324254^;2~^^^&&#343141fg 125t5 4$##%#234tg lkg k;kl} result"
>>> re.sub(r" \text .*}", '', strg)
'final result'

>>> strg = "final \text {balh blah 1 2 12, equi445; code22****...} result"
>>> re.sub(r" \text .*}", '', strg)
'final result'

>>> strg = "final \text {{.}swdwwqw {}}. } qdd{{}}} dqqq uit(q.}} result"
>>> re.sub(r" \text .*}", '', strg)
'final result'

【讨论】：

答案是错误的。 r" \text .*}" 正则表达式中的 \t 是 TAB 匹配模式，它不匹配反斜杠和 t。您的 strg 示例字符串文字还包含一个 TAB，从而产生了它起作用的错觉。此外，.* 会贪婪地匹配到最后一个 }，如果一行中有两个匹配项，则会删除太多。在这种情况下，你不能依赖贪心点。
这不是OP的要求吗？匹配到最后一个}，后跟空格和result。请再次阅读问题。
是的，请再读一遍，直到 last } 之前都没有提到匹配。一旦一行中有两个\text（您的正则表达式不匹配），您的.* 将抓取两个匹配项之间的文本，这些匹配项也将被删除。但无论如何都不会，因为您的正则表达式与 OP 的文本不匹配。
\t 中的 \text 是反斜杠和 t，而不是 TAB 怎么清楚？ OP说他要删除\text，你怎么解释它只是反斜杠，at和'ext'？

【解决方案2】：

如果大括号之间没有{ 和}，则可以这样使用Python re：

re.sub(r'\s*\\text\s*{[^{}]*}', '', s)

请参阅regex demo #1。在这里，\s*\\text\s*{[^{}]*} 匹配

\s* - 零个或多个空白字符
\\ - 一个 \ 字符
text - text 字符串
\s* - 零个或多个空格
{[^{}]*} - {，除 { 和 } 和 } 之外的任何零个或多个字符。

如果需要匹配嵌套大括号，需要安装PyPi regex模块（在终端运行pip install regex）然后使用

import regex
#...
text = regex.sub(r'\s*\\text\s*({(?:[^{}]++|(?1))*})', '', text)

请参阅regex demo #2。这里，

\s*\\text\s* - 匹配 \text，用可选空格括起来
({(?:[^{}]++|(?1))*}) - 第 1 组：
- { - 一个 { 字符
- (?:[^{}]++|(?1))* - 除了 { 和 } 或整个第 1 组模式之外，一个或多个字符出现零次或多次
- } - } 字符。

在线查看Python demo。

【讨论】：

这应该是它，同时 OP 尝试 r"\\text 匹配 \text