使用 Python 查找子元素的特定 XML 属性？答案

【问题标题】：Finding specific XML attribute of child element using Python?使用 Python 查找子元素的特定 XML 属性？
【发布时间】：2019-01-15 14:45:42
【问题描述】：

<root>
  <article>
    <front>
      <body>
        <back>
          <sec id="sec7" sec-type="funding">
            <title>Funding</title>
            <p>This work was supported by the NIH</p>
          </sec>
        </back>

我有一个科学期刊元数据的 XML 文件，我正在尝试仅提取每篇文章的资助信息。我需要p 标签中包含的信息。虽然“sec id”因文章而异，但“sec-type”始终是“funding”。

我一直在尝试使用元素树在 Python3 中执行此操作。

import xml.etree.ElementTree as ET  

tree = ET.parse(journals.xml)
root = tree.getroot()
for title in root.iter("title"):
    ET.dump(title)

任何帮助将不胜感激！

【问题讨论】：

你能举一个完整的有效 XML 的例子吗？

标签： python xml parsing

【解决方案1】：

您可以使用findall 和XPath 表达式来提取您想要的值。我从您的示例数据中进行了一些推断，以完成文档并拥有两个 p 元素：

<root>
  <article>
    <front>
      <body>
        <back>
          <sec id="sec7" sec-type="funding">
            <title>Funding</title>
            <p>This work was supported by the NIH</p>
          </sec>
          <sec id="sec8" sec-type="funding">
            <title>Funding</title>
            <p>I'm a little teapot</p>
          </sec>
        </back>
      </body>
    </front>
  </article>
</root>

下面提取sec节点下p节点的所有文本内容，其中sectype="funding"：

import xml.etree.ElementTree as ET

doc = ET.parse('journals.xml')
print([p.text for p in doc.findall('.//sec[@sec-type="funding"]/p')])

结果：

['This work was supported by the NIH', "I'm a little teapot"]

【讨论】：

感谢您的回答。有没有办法将此 XPath 表达式与对特定元素文本的简单搜索结合起来，以便为每篇文章获得标题以及相应的资金信息？ for elem in tree.iter(tag='article-id'): print(elem.text) print([p.text for p in doc.findall('.//sec[@sec-type="funding"]/p')]) 这分别给了我文章 ID 和资金信息，但理想情况下我想要这些匹配