在 python 中解析 XML 文件答案

【问题标题】：Parse a XML file in python在 python 中解析 XML 文件
【发布时间】：2020-06-18 18:45:02
【问题描述】：

**我有一个 XML 文件，我需要在其中读取它并将每一列保存到一个 excel 文件中。有人可以帮忙吗？

我在声明语句之后有几行，但我想从 table1 解析到 /table1 有人可以帮我吗？

**<?xml version="1.0" encoding="Metadata" ?>
    <DECLARE  lmsid ="asdhgh"
     ...........
    </table1 name ="employee table" name ="E1 Enterprises" refid ="201"
     <data id = "ABC" emp = "dt">
     <country id ="m1" name =dt1">
     <rank text> "data"</rank text>
     <rank textd> "direction"</rank textd>
     <reference>
     <ref id ="9900m" id1="1000" ref="URL">
     </reference>
     </country>
    <data id = "xyz" emp = "dt1">
    <country id ="m2" name =dt2">
    <rank text> "data1"</rank text>
    <rank textd> "direction1"</rank textd>
    <reference>
    <ref id ="9900m" id1="2000" ref="URL">
    </reference>
    </country>
    </data id>
    ....
    </table1>
    </table1 name ="Manager table" name ="E1 Enterprises" refid ="202"
    <data id = "ARZ" emp = "dt">
    <country id ="m1" name =dt1">
     <rank text> "data"</rank text>
     <rank textd> "direction"</rank textd>
     <reference>
     <ref id ="9900m" id1="1000" ref="URL">
     </reference>
     </country>
     <data id = "QNC" emp = "dt1">
     <country id ="m2" name =dt2">
     <rank text> "data1"</rank text>
     <rank textd> "direction1"</rank textd>
     <reference>
     <ref id ="9900m" id1="2000" ref="URL">
     </reference>
     </country>
     </data id>
      ....
     </table1>
...

谢谢奥鲁什 **

【问题讨论】：

这能回答你的问题吗？ How do I parse XML in Python?
请证明您做了一些研究并做出了认真的尝试。到目前为止，您尝试过什么？
我尝试使用元素树，我能够获取值，但我不知道如何在 excel 中保存以及如何从 Table1 开始解析。

标签： python xml parsing

【解决方案1】：

所以我认为你可以只使用 BeautifulSoup 来解析 XML 的东西。我在网上找到了这个sn-p的代码

# Import BeautifulSoup
from bs4 import BeautifulSoup

content = []

# Read the XML file
with open("sample.xml", "r") as file:

    # Read each line in the file, readlines() returns a list of lines
    content = file.readlines()

    # Combine the lines in the list into a string
    content = "".join(content)
    soup = BeautifulSoup(content, "lxml")

    #Do things

BS4 可以很容易地找到 xml 标签。它的文档很广泛，但如果您正在寻找该信息，则类似于soup.find('data', id='xyz')。然后只需使用 pandas 或 csv 模块导出到 csv。

【讨论】：

【解决方案2】：

不知道保存每一列是什么意思。 XML 具有： - 标签名称 - 属性 - 文字

您可以使用 xml.dom.minidom 模块

>>> s = '<t><a name="1"></a><a name="2"></a></t>'
>>> x = xml.dom.minidom.parseString(s)
>>> a = x.getElementsByTagName("a")
>>> for i in a:
...     print i.getAttribute("name")
...     
1
2

您还可以解析 .xml 文件。 x = xml.dom.minidom.parse("c:\xmlFile.xml")

在文档中查看更多详细信息：x = https://docs.python.org/2/library/xml.dom.minidom.html

一旦你有了想要保存到 Excel 中的值，你就可以使用 pyodbc 和 microsoft odbc 驱动程序运行 SQL 语句（Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)} ):

import pyodbc

connection = pyodbc.connect("Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)}; readonly=0; DBQ=C:\yourfileName.xlsx")

cursor = connection.cursor()
sql = "insert into [Sheet1$] (col1,col2) values (val1,val2)"
cursor.execute(sql)

【讨论】：

我的输出应该看起来像 excel table1 name refid data emp countryID name ....employee table E1 Enterprises 201 ABC dt m1 dt1 每个标签及其要填充到 excel 表中的值，就是这样我是说。
我已经编辑了上面的内容并添加了一个如何通过 sql 将数据插入到 excel 中的示例