【问题标题】:Find all delimiters next to substring in string and replace in python查找字符串中子字符串旁边的所有分隔符并在python中替换
【发布时间】:2014-08-30 17:17:01
【问题描述】:
示例字符串:
s = "<sec>John</sec> said hi to a woman (named <sec>Mary)</sec>"
我需要将其转换为:
s = "<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)"
这需要对开始和结束标记以及“.”、“”、“-”、“(”、“)”等所有分隔符都有效。
我可以搜索并替换 ")" 等等,但显然我想要更性感的东西。
所以基本上将所有分隔符移到标签之外。
谢谢!
【问题讨论】:
标签:
python
regex
search
replace
【解决方案1】:
下面的正则表达式将帮助您将存在于开始和结束标记内的分隔符移动到结束标记的下一个。
(<sec>)([^.,()-]*)([.,()-])(<\/sec>)
替换字符串:
\1\2\4\3
DEMO
>>> s = "<sec>John</sec> said hi to a woman (named <sec>Mary)</sec>"
>>> re.sub(r'(<sec>)([^.,()-]*)([.,()-])(<\/sec>)', r'\1\2\4\3', s)
'<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)'
或
这适用于任何标签,
>>> s = "<sec>John</sec> said hi to a woman (named <sec>Mary)</sec>"
>>> re.sub(r'(<(\S+?\b)[^>]*>)([^.,()-]*)([.,()-])(<\/\2>)', r'\1\3\5\4', s)
'<sec>John</sec> said hi to a woman (named <sec>Mary</sec>)'
【解决方案2】:
另一个正则表达式变体:
>>> s = "Nicely<sec>, John</sec> said hi to a woman (named <sec>Mary)</sec>"
>>> re.sub(r'((?:<[^>]+>)?)( *[-.(),]+ *)((?:</[^>]+>)?)',r'\3\2\1',s)
# ^^ ^^
# move spaces with the punctuation
# remove that if not needed
'Nicely, <sec>John</sec> said hi to a woman (named <sec>Mary</sec>)'
这个想法是交换开始标签↔标点或标点↔结束标签。