使用python regex从字符串中获取单词但不在html标签中答案

【问题标题】：get word from string but not in html tags using python regex使用python regex从字符串中获取单词但不在html标签中
【发布时间】：2014-09-09 07:34:22
【问题描述】：

我想从段落中搜索一个单词，段落也有 html 文本。但我只想要不在 html 标签中的简单文本中的单词。

例如。

post_content = """I have a question about xyz.
                I have a question about xyz .
                I have a question about xyz?
                I have a question about <a href="hello">xyz</a>.
                I have a question about <a href="hello">abc xyz</a>
                 xyz
                *xyz"""

我不想要来自<a></a> 的 xyz。

请给我一个正则表达式，我试过[^<.+?>]xyz

查看演示：DEMO

更新代码

post_content = <above string>
keyword = "xyz"
pattern = r"(?!((?!<).)*<\/)%s" % keyword
replace = "<a href='#'>xyz</a>" 
post_content = re.sub(pattern, replace, post_content)
print "post_content", post_content

【问题讨论】：

现在检查。这段代码对我有用。

标签： python regex python-2.7 python-3.x

【解决方案1】：

图案

(?!((?!<).)*<\/)xyz

观看直播REGEXP

它使用负前瞻，并表示匹配应在< 和</ 之间。

【讨论】：

有一个问题，我将这个表达式更新为我的 python 代码它不起作用。请查看我更新的问题。
@i'mPosSible 不要使用 match.use 搜索。
@i'mPosSible 你的 post_content 语法错误。它应该是 """ """
是的，但是没有静态字符串我从 html FORM 中得到这个，它的类型是 unicode。
@i'mPosSible 表示格式为多行字符串。您必须将其括在""" 和""" 之间

【解决方案2】：

只需使用否定前瞻来匹配标签中不存在的所有xyz 字符串。

xyz(?![^<>]*<\/)

DEMO

>>> import re
>>> s = """I have a question about xyz.
... I have a question about xyz .
... I have a question about xyz?
... I have a question about <a href="hello">xyz</a>.
... I have a question about <a href="hello">abc xyz</a>
...  xyz
... *xyz"""
>>> m = re.findall(r'xyz(?![^<>]*<\/)', s)
>>> for i in m:
...     print i
... 
xyz
xyz
xyz
xyz
xyz

【讨论】：