在 Python 3.7 中使用 XPath 和 ElementTree 从 XML 文件中查找和提取值答案

【问题标题】：using XPath with ElementTree in Python 3.7 to find and extract values from XML file在 Python 3.7 中使用 XPath 和 ElementTree 从 XML 文件中查找和提取值
【发布时间】：2020-11-13 02:31:17
【问题描述】：

我尝试解析的 XML 文件位于 here。此 XML 具有已定义的名称空间。下面是来自 XML 文件的示例，其中包含 pertintnet 元素：

    <series>
        <header>
            <type>instantaneous</type>
            <locationId>Fredericton</locationId>
            <parameterId>HG</parameterId>
            <timeStep unit="second" multiplier="3600"/>
            <startDate date="2020-05-11" time="07:00:00"/>
            <endDate date="2020-05-15" time="07:00:00"/>
            <missVal>-999</missVal>
            <stationName>SAINT JOHN RIVER AT FREDERICTON</stationName>
            <units>M</units>
        </header>
        <event date="2020-05-11" time="07:00:00" value="4.69" flag="0"/>
        <event date="2020-05-11" time="08:00:00" value="4.66" flag="0"/>
        <event "many records deleted to save space"/>
        <event date="2020-05-15" time="06:00:00" value="4.27" flag="0"/>
        <event date="2020-05-15" time="07:00:00" value="-999" flag="8"/>
    </series>

我需要通过存储在 <locationId> 元素中的文本来搜索 XML 文件，例如“Fredericton”。找到“Fredericton”后，我需要提取 <parmeterId> 文本，还需要从第一个和最后一个 <event> 元素中获取属性。这是我到目前为止的代码。如何使用 XPath 获取我需要的元素？我注释掉了我的尝试，但没有成功。

import os
from xml.etree import ElementTree as ET
file_name = 'StJohn_FEWSNB_export.xml'
full_file = os.path.abspath(os.path.join('data', file_name))
print(full_file)

tree = ET.parse(full_file)
root = tree.getroot()

location_lst = [
'Nashwaak','Kennebecasis','Fredericton','Maugerville','Jemseg','Grand_Lake',
'Lakeville_Corner','Gagetown','Oak_Point','Hampton','Saint_John','Connors',
'St_Francois','Ft_Kent','Baker_Brook','St_Hilaire','Edmundston','Iroquois',
'St_Basile','St_Anne','St_Leonard','Perth','Simonds','Hartland','Woodstock'
]

for loc in location_lst:
    for location in root.iter('{http://www.wldelft.nl/fews/PI}locationId'):
        if location.text == loc:
##            type = element.findall('.//{http://www.wldelft.nl/fews/PI}parameterId')
            print(loc, location.text)

谢谢，伯尼。

【问题讨论】：

标签： python-3.x xml xpath elementtree

【解决方案1】：

这是一个使用 lxml 而不是 elementtree 的答案，以及您的 xml 和位置列表的简化版本，以提取您的输出的缩小版本。显然，你可以修改它以适应实际的 xml 和输出：

    from lxml import etree
    events = """<?xml version="1.0" encoding="UTF-8"?>
    <root>
       <series>
          <header>
             <type>instantaneous</type>
             <locationId>Lakeville_Corner</locationId>
             <parameterId>SSTG</parameterId>
             <timeStep unit="second" multiplier="3600" />
          </header>
          <event date="2020-05-15" time="07:00:00" value="3.64" flag="0" />
          <event date="2020-05-15" time="08:00:00" value="3.64" flag="0" />
          <event date="2020-05-20" time="07:00:00" value="3.157" flag="0" />
       </series>
       <series>
          <header>
             <type>instantaneous</type>
             <locationId>Gagetown</locationId>
             <parameterId>HG</parameterId>
             <timeStep unit="second" multiplier="3600" />
          </header>
          <event date="2020-05-11" time="07:00:00" value="3.99" flag="0" />
          <event date="2020-05-11" time="08:00:00" value="3.99" flag="0" />
          <event date="2020-05-15" time="07:00:00" value="3.43" flag="0" />
       </series>
    </root>
    """
    doc = etree.XML(events.encode())
    location_lst = ["Lakeville_Corner",'Gagetown']
    series = doc.xpath('//series')
    location_lst = ["Lakeville_Corner",'Gagetown']
    series = doc.xpath('//series')
    for loc in location_lst:
      for s in series:
        exp = f'./header/locationId[text()="{loc}"]'
        target = s.xpath(exp)
        if target:
            pid = target[0].xpath('./following-sibling::parameterId/text()')[0]
            date_first = s.xpath('.//event[1]/@date')[0]
            value_first = s.xpath('.//event[1]/@value')[0]
            date_last = s.xpath('.//event[last()]/@date')[0]
            value_last = s.xpath('.//event[last()]/@value')[0]
      print(loc,pid,date_first,value_first,date_last,value_last)

输出：

Lakeville_Corner SSTG 2020-05-15 3.64 2020-05-20 3.157
Gagetown HG 2020-05-11 3.99 2020-05-15 3.43

【讨论】：