【问题标题】:Iterate through all elements of XML file遍历 XML 文件的所有元素
【发布时间】:2015-10-29 22:40:25
【问题描述】:

我有一个这样的 XML 文件:

<CustomerOrders>
  <Customers>
    <CustomerID>ALFKI</CustomerID>
    <Orders>
      <OrderID>10643</OrderID>
      <CustomerID>ALFKI</CustomerID>
      <OrderDate>1997-08-25</OrderDate>
    </Orders>
    <Orders>
      <OrderID>10692</OrderID>
      <CustomerID>ALFKI</CustomerID>
      <OrderDate>1997-10-03</OrderDate>
    </Orders>
    <CompanyName>Alfreds Futterkiste</CompanyName>
  </Customers>
  <Customers>
    <CustomerID>ANATR</CustomerID>
    <Orders>
      <OrderID>10308</OrderID>
  <CustomerID>ANATR</CustomerID>
  <OrderDate>1996-09-18</OrderDate>
    </Orders>
    <CompanyName>Ana Trujillo Emparedados y helados</CompanyName>
  </Customers>
</CustomerOrders>

我想提取每个元素以转换为小写。我知道我可以递归地遍历所有节点和子节点,但我正在努力输出实际元素。

现在在我的代码中,我只是打印所有标签及其属性,也可以手动打印元素

import xml.etree.ElementTree as ET
tree = ET.parse('customer.xml')
root = tree.getroot()
for descendant in root.findall(".//*"):
    print descendant.tag, descendant.attrib
print root[0][1][0].text #prints 10643

我想要的是能够打印出文件的每个元素,并将它们全部转换为小写。

预期输出:

CustomerID = alfki
OrderID = 10643
CustomerID = alfki
OrderDate = 1997-08025
OrderID = 10692         
CustomerID = alfki
OrderDate = 1997-10-03
CompanyName = alfreds futterkiste

等等

【问题讨论】:

  • 欢迎来到 StackOverflow。请阅读并遵循帮助文档中的发布指南。 MCVE 适用于此。 “我在挣扎”并没有描述问题;请编辑您的帖子以包含实际输出和您的预期。
  • 试一试并反馈。

标签: python xml


【解决方案1】:

我的尝试如下-

import lxml.etree as et


s="""

<CustomerOrders>
  <Customers>
    <CustomerID>ALFKI</CustomerID>
    <Orders>
      <OrderID>10643</OrderID>
      <CustomerID>ALFKI</CustomerID>
      <OrderDate>1997-08-25</OrderDate>
    </Orders>
    <Orders>
      <OrderID>10692</OrderID>
      <CustomerID>ALFKI</CustomerID>
      <OrderDate>1997-10-03</OrderDate>
    </Orders>
    <CompanyName>Alfreds Futterkiste</CompanyName>
  </Customers>
  <Customers>
    <CustomerID>ANATR</CustomerID>
    <Orders>
      <OrderID>10308</OrderID>
  <CustomerID>ANATR</CustomerID>
  <OrderDate>1996-09-18</OrderDate>
    </Orders>
    <CompanyName>Ana Trujillo Emparedados y helados</CompanyName>
  </Customers>
</CustomerOrders>

"""


tree = et.fromstring(s)

for txt in tree.xpath('//text()/parent::*[1]'):
   txt.text = "%s"%txt.text.lower()

print et.tostring(tree, pretty_print=True)

打印出来-

<CustomerOrders>
  <Customers>
    <CustomerID>alfki</CustomerID>
    <Orders>
      <OrderID>10643</OrderID>
      <CustomerID>alfki</CustomerID>
      <OrderDate>1997-08-25</OrderDate>
    </Orders>
    <Orders>
      <OrderID>10692</OrderID>
      <CustomerID>alfki</CustomerID>
      <OrderDate>1997-10-03</OrderDate>
    </Orders>
    <CompanyName>alfreds futterkiste</CompanyName>
  </Customers>
  <Customers>
    <CustomerID>anatr</CustomerID>
    <Orders>
      <OrderID>10308</OrderID>
  <CustomerID>anatr</CustomerID>
  <OrderDate>1996-09-18</OrderDate>
    </Orders>
    <CompanyName>ana trujillo emparedados y helados</CompanyName>
  </Customers>
</CustomerOrders>

【讨论】:

    【解决方案2】:

    考虑使用XSLTtranslate() 函数。作为信息,XSLT 是一种特殊用途的编程语言,用于转换、样式化、重新格式化和重新构造 XML 文档。您可以避免 Python 中所有节点和文本的递归循环。

    XSLT 脚本(另存为 .xsl 或 .xslt 以包含在 Python 中)

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:strip-space elements="*"/>
    
    <xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'" />
    <xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />
    
     <!-- Identity Transform -->
     <xsl:template match="@*|node()">
       <xsl:copy>
         <xsl:apply-templates select="node()"/>
       </xsl:copy>
     </xsl:template>
    
     <xsl:template match="text()">       
         <xsl:value-of select="translate(., $uppercase, $lowercase)"/>       
     </xsl:template>      
    </xsl:stylesheet>
    

    Python脚本

    import lxml.etree as ET
    
    dom = ET.parse('customer.xml'))
    xslt = ET.parse('XSLTscript.xsl'))
    transform = ET.XSLT(xslt)
    newdom = transform(dom)
    
    tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)
    print(tree_out)
    
    xmlfile = open(os.path.join(cd, 'Output.xml'),'wb')
    xmlfile.write(tree_out)
    xmlfile.close()
    

    输出

    <?xml version='1.0' encoding='UTF-8'?>
    <CustomerOrders>
      <Customers>
        <CustomerID>alfki</CustomerID>
        <Orders>
          <OrderID>10643</OrderID>
          <CustomerID>alfki</CustomerID>
          <OrderDate>1997-08-25</OrderDate>
        </Orders>
        <Orders>
          <OrderID>10692</OrderID>
          <CustomerID>alfki</CustomerID>
          <OrderDate>1997-10-03</OrderDate>
        </Orders>
        <CompanyName>alfreds futterkiste</CompanyName>
      </Customers>
      <Customers>
        <CustomerID>anatr</CustomerID>
        <Orders>
          <OrderID>10308</OrderID>
          <CustomerID>anatr</CustomerID>
          <OrderDate>1996-09-18</OrderDate>
        </Orders>
        <CompanyName>ana trujillo emparedados y helados</CompanyName>
      </Customers>
    </CustomerOrders>
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2013-01-12
      • 2021-06-17
      • 1970-01-01
      • 1970-01-01
      • 2013-06-11
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多