Python中的正则表达式匹配子标签[关闭]答案

【问题标题】：Regular Expression match subtags in Python [closed]Python中的正则表达式匹配子标签[关闭]
【发布时间】：2019-08-12 08:31:54
【问题描述】：

我尝试开发一个正则表达式来匹配子标签。

我查看了这篇文章： Regex to find words between two tags

这个正则表达式可以提取“doc-number”标签中的所有值。

*<doc-number>(.*?)</doc-number>

但是，我只想获取另一个标签内的值。让我们打电话。我尝试了下面的表达式，但它不起作用。

"<patcit(.*?)<doc-number>(.*?)</doc-number>(.*?)</patcit>"

我能得到一些帮助吗？

示例 XML 文件：

<us-citation>
<patcit num="00003">
<document-id>
<country>US</country>
<doc-number>6172888</doc-number>
<kind>B1</kind>
<name>Jochi</name>
<date>20010100</date>
</document-id>
</patcit>
<category>cited by examiner</category>
<classification-cpc-text>B23K 11/258</classification-cpc-text>
<classification-national><country>US</country><main-classification>363 89</main-classification></classification-national>
</us-citation>

【问题讨论】：

改用 XML 解析器怎么样？

标签： python regex xml

【解决方案1】：

你shouldn't be using regular expression to parse a XML。相反，标准库中的xml.etree.ElementTree 是更好的选择。问题"How to use Xpath in Python" 的答案可能也很有趣。

【讨论】：