【问题标题】:using XPath with ElementTree in Python 3.7 to find and extract values from XML file在 Python 3.7 中使用 XPath 和 ElementTree 从 XML 文件中查找和提取值
【发布时间】:2020-11-13 02:31:17
【问题描述】:

我尝试解析的 XML 文件位于 here。此 XML 具有已定义的名称空间。下面是来自 XML 文件的示例,其中包含 pertintnet 元素:

    <series>
        <header>
            <type>instantaneous</type>
            <locationId>Fredericton</locationId>
            <parameterId>HG</parameterId>
            <timeStep unit="second" multiplier="3600"/>
            <startDate date="2020-05-11" time="07:00:00"/>
            <endDate date="2020-05-15" time="07:00:00"/>
            <missVal>-999</missVal>
            <stationName>SAINT JOHN RIVER AT FREDERICTON</stationName>
            <units>M</units>
        </header>
        <event date="2020-05-11" time="07:00:00" value="4.69" flag="0"/>
        <event date="2020-05-11" time="08:00:00" value="4.66" flag="0"/>
        <event "many records deleted to save space"/>
        <event date="2020-05-15" time="06:00:00" value="4.27" flag="0"/>
        <event date="2020-05-15" time="07:00:00" value="-999" flag="8"/>
    </series>

我需要通过存储在 &lt;locationId&gt; 元素中的文本来搜索 XML 文件,例如“Fredericton”。找到“Fredericton”后,我需要提取 &lt;parmeterId&gt; 文本,还需要从第一个和最后一个 &lt;event&gt; 元素中获取属性。这是我到目前为止的代码。如何使用 XPath 获取我需要的元素?我注释掉了我的尝试,但没有成功。

import os
from xml.etree import ElementTree as ET
file_name = 'StJohn_FEWSNB_export.xml'
full_file = os.path.abspath(os.path.join('data', file_name))
print(full_file)

tree = ET.parse(full_file)
root = tree.getroot()

location_lst = [
'Nashwaak','Kennebecasis','Fredericton','Maugerville','Jemseg','Grand_Lake',
'Lakeville_Corner','Gagetown','Oak_Point','Hampton','Saint_John','Connors',
'St_Francois','Ft_Kent','Baker_Brook','St_Hilaire','Edmundston','Iroquois',
'St_Basile','St_Anne','St_Leonard','Perth','Simonds','Hartland','Woodstock'
]

for loc in location_lst:
    for location in root.iter('{http://www.wldelft.nl/fews/PI}locationId'):
        if location.text == loc:
##            type = element.findall('.//{http://www.wldelft.nl/fews/PI}parameterId')
            print(loc, location.text)

谢谢, 伯尼。

【问题讨论】:

    标签: python-3.x xml xpath elementtree


    【解决方案1】:

    这是一个使用 lxml 而不是 elementtree 的答案,以及您的 xml 和位置列表的简化版本,以提取您的输出的缩小版本。显然,你可以修改它以适应实际的 xml 和输出:

        from lxml import etree
        events = """<?xml version="1.0" encoding="UTF-8"?>
        <root>
           <series>
              <header>
                 <type>instantaneous</type>
                 <locationId>Lakeville_Corner</locationId>
                 <parameterId>SSTG</parameterId>
                 <timeStep unit="second" multiplier="3600" />
              </header>
              <event date="2020-05-15" time="07:00:00" value="3.64" flag="0" />
              <event date="2020-05-15" time="08:00:00" value="3.64" flag="0" />
              <event date="2020-05-20" time="07:00:00" value="3.157" flag="0" />
           </series>
           <series>
              <header>
                 <type>instantaneous</type>
                 <locationId>Gagetown</locationId>
                 <parameterId>HG</parameterId>
                 <timeStep unit="second" multiplier="3600" />
              </header>
              <event date="2020-05-11" time="07:00:00" value="3.99" flag="0" />
              <event date="2020-05-11" time="08:00:00" value="3.99" flag="0" />
              <event date="2020-05-15" time="07:00:00" value="3.43" flag="0" />
           </series>
        </root>
        """
        doc = etree.XML(events.encode())
        location_lst = ["Lakeville_Corner",'Gagetown']
        series = doc.xpath('//series')
        location_lst = ["Lakeville_Corner",'Gagetown']
        series = doc.xpath('//series')
        for loc in location_lst:
          for s in series:
            exp = f'./header/locationId[text()="{loc}"]'
            target = s.xpath(exp)
            if target:
                pid = target[0].xpath('./following-sibling::parameterId/text()')[0]
                date_first = s.xpath('.//event[1]/@date')[0]
                value_first = s.xpath('.//event[1]/@value')[0]
                date_last = s.xpath('.//event[last()]/@date')[0]
                value_last = s.xpath('.//event[last()]/@value')[0]
          print(loc,pid,date_first,value_first,date_last,value_last)
            
    

    输出:

    Lakeville_Corner SSTG 2020-05-15 3.64 2020-05-20 3.157
    Gagetown HG 2020-05-11 3.99 2020-05-15 3.43
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2013-02-17
      • 2016-04-07
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-03-20
      相关资源
      最近更新 更多