【问题标题】:parse xml file with python3用python3解析xml文件
【发布时间】:2022-01-11 21:51:38
【问题描述】:

我根本不熟悉 xml 文件,但试图解析这个:

<?xml version="1.0" encoding="ISO-8859-1"?>
<modeling>
 <generator>
  <i name="subversion" type="string">(build Dec 07 2018 23:19:03) complex            parallel </i>
  <i name="platform" type="string">LinuxIFC </i>
  <i name="date" type="string">2019 07 11 </i>
  <i name="time" type="string">11:56:12 </i>
 </generator>
 <incar>
  <i type="int" name="ISTART">     0</i>
  <i type="string" name="PREC">accurate</i>
  <i type="int" name="ISPIN">     2</i>
  <i type="int" name="NELMDL">    -8</i>
  <i type="int" name="IBRION">     2</i>
  <i name="EDIFF">      0.00001000</i>
  <i name="EDIFFG">     -0.01000000</i>
  <i type="int" name="NSW">   200</i>
  <i type="int" name="ISIF">     2</i>
  <i type="int" name="ISYM">     2</i>
  <i name="ENCUT">    750.00000000</i>
  <i name="POTIM">      0.30000000</i>
</incar>

到目前为止,我已经设法编写代码以获取Elements

#!/usr/bin/env python
import xml.etree.ElementTree as ET

tree = ET.parse("vasprun.xml")
root = tree.getroot()
for child in root:
  print({x for x in root.findall(child.tag)})

输出如下:

{<Element 'generator' at 0x7f342220ca90>}
{<Element 'incar' at 0x7f342220cd10>}

我正在尝试从incar 获取文件:

IStart=0
Prec=accurate

谁能帮我搞定这个?

【问题讨论】:

  • [{n.get("name"): n.text.strip() for n in node} for node in root]

标签: python xml xml-parsing elementtree


【解决方案1】:

以下作品(XPath)

import xml.etree.ElementTree as ET


xml = '''<?xml version="1.0" encoding="UTF-8"?>
<modeling>
   <generator>
      <i name="subversion" type="string">(build Dec 07 2018 23:19:03) complex            parallel</i>
      <i name="platform" type="string">LinuxIFC</i>
      <i name="date" type="string">2019 07 11</i>
      <i name="time" type="string">11:56:12</i>
   </generator>
   <incar>
      <i type="int" name="ISTART">0</i>
      <i type="string" name="PREC">accurate</i>
      <i type="int" name="ISPIN">2</i>
      <i type="int" name="NELMDL">-8</i>
      <i type="int" name="IBRION">2</i>
      <i name="EDIFF">0.00001000</i>
      <i name="EDIFFG">-0.01000000</i>
      <i type="int" name="NSW">200</i>
      <i type="int" name="ISIF">2</i>
      <i type="int" name="ISYM">2</i>
      <i name="ENCUT">750.00000000</i>
      <i name="POTIM">0.30000000</i>
   </incar>
</modeling>'''

root = ET.fromstring(xml)
names = ['ISTART','PREC']
for name in names:
  i = root.find(f'.//i[@name="{name}"]')
  print(i.text)

输出

0
accurate

【讨论】:

  • 谢谢,但我不是那个意思。我正在尝试获取 incar 中的所有 name=value。虽然赞成。
【解决方案2】:

在附加缺少的最终标记 &lt;/modeling&gt; 后,将示例 XML 添加到文件中

然后:

import xml.etree.ElementTree as ET

with open('vasprun.xml') as xml:
    root = ET.fromstring(xml.read())
    for name in ['ISTART', 'PREC']:
        if (t := root.find(f'.//i[@name="{name}"]')) is not None:
            print(f'{name}:{t.text.strip()}')

【讨论】:

    【解决方案3】:

    如果存在关闭建模标签,您可以使用 XPath 来完成这项工作。

    获取 ISTART 值的 xpath 是://incar/*[@name='ISTART']

    获取 PREC 值的 xpath 是://incar/*[@name='PREC']

    然后:

    
    from lxml import etree
    
    xml_doc = """
            <?xml version="1.0" encoding="ISO-8859-1"?>
                <modeling>
                    <generator>
                          <i name="subversion" type="string">(build Dec 07 2018 23:19:03) complex            parallel </i>
                          <i name="platform" type="string">LinuxIFC </i>
                          <i name="date" type="string">2019 07 11 </i>
                          <i name="time" type="string">11:56:12 </i>
                    </generator>
                         <incar>
                            <i type="int" name="ISTART">     0</i>
                            <i type="string" name="PREC">accurate</i>
                            <i type="int" name="ISPIN">     2</i>
                            <i type="int" name="NELMDL">    -8</i>
                            <i type="int" name="IBRION">     2</i>
                            <i name="EDIFF">      0.00001000</i>
                            <i name="EDIFFG">     -0.01000000</i>
                            <i type="int" name="NSW">   200</i>
                            <i type="int" name="ISIF">     2</i>
                            <i type="int" name="ISYM">     2</i>
                            <i name="ENCUT">    750.00000000</i>
                            <i name="POTIM">      0.30000000</i>
                         </incar>
                </modeling>
                """
    parser = etree.XMLParser(resolve_entities=False, strip_cdata=False, recover=True, ns_clean=True)
    xml_tree = etree.fromstring(xml_doc.encode(), parser=parser)
    istart = xml_tree.xpath('//incar/*[@name="ISTART"]')
    prec = xml_tree.xpath('//incar/*[@name="PREC"]')
    print(f'ISTART={int(istart[0].text)}')
    print(f'Prec={prec[0].text}')
    
    
    
    
    
    
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-09-21
      • 2016-06-19
      • 1970-01-01
      • 2011-11-20
      • 2014-08-23
      • 1970-01-01
      • 2015-10-26
      • 2021-11-15
      相关资源
      最近更新 更多