【发布时间】:2025-11-29 15:20:03
【问题描述】:
我有句子。
text="The president of America is <PERSON>Barack Obama</PERSON>. He was born on August 4, 1961. Obama was reelected president in November 2012".
我想把<PERSON></PERSON>标签放在“奥巴马”中,结果会是这样的:The president of America is <PERSON>Barack Obama</PERSON>. He was born on August 4, 1961. <PERSON>Obama</PERSON> was reelected president in November 2012".
我想找到子字符串(例如:奥巴马)子字符串之前没有标签<PERSON>,子字符串之后没有标签</PERSON>,但我不知道python中正则表达式的正确语法。
**我是 python 新手:''
使用简单的正则表达式re.sub(namedEntity, "<PERSON>"+namedEntity+"</PERSON>", text) 将给出输出The president of America is <PERSON>Barack <PERSON>Obama</PERSON></PERSON>. He was born on August 4, 1961. <PERSON>Obama</PERSON> was reelected president in November 2012".
这是我的代码(使用 python2.7)
import re
result=re.sub(r"((?!<PERSON>).*"+namedEntity+".*(?!</PERSON>))","<PERSON>"+namedEntity+"</PERSON>",text)
print "result: "+result
输出result: <PERSON>Obama</PERSON>
我不知道这是第一个“奥巴马”还是第二个。
感谢您之前的帮助
【问题讨论】:
-
您是否从某个地方复制了代码?你明白你在那个正则表达式中做了什么吗?
-
我在regex101.com/#python 中尝试了正则表达式,从这个答案*.com/questions/6259443/… 中学习。也许我错了,因为我认为 ?!regex 的意思是“不包含正则表达式”:''