Python xml.etree.ElementTree 'findall()' 方法不适用于多个命名空间答案

【问题标题】：Python xml.etree.ElementTree 'findall()' method does not work with several namespacesPython xml.etree.ElementTree 'findall()' 方法不适用于多个命名空间
【发布时间】：2022-01-19 10:55:47
【问题描述】：

我正在尝试解析具有多个名称空间的 XML 文件。我已经有一个生成命名空间映射的函数——一个带有命名空间前缀和命名空间标识符的字典（代码中的示例）。但是，当我将此字典传递给 findall() 方法时，它仅适用于第一个命名空间，但如果 XML 路径上的元素位于另一个命名空间中，则不会返回任何内容。

（仅适用于第一个以None 为前缀的命名空间。）

这是一个代码示例：

import xml.etree.ElementTree as ET

file - '.\folder\example_file.xml' # path to the file
xml_path = './DataArea/Order/Item/Price' # XML path to the element node

tree = ET.parse(file)
root = tree.getroot()
nsmap = dict([node for _, node in ET.iterparse(exp_file, events=['start-ns'])])
# This produces a dictionary with namespace prefixes and identifiers, e.g.
# {'': 'http://firstnamespace.example.com/', 'foo': 'http://secondnamespace.example.com/', etc.}
for elem in root.findall(xml_path, nsmap):
    # Do something

编辑： 根据 mzjn 的建议，我包含了示例 XML 文件：

<?xml version="1.0" encoding="utf-8"?>
<SampleOrder xmlns="http://firstnamespace.example.com/" xmlns:foo="http://secondnamespace.example.com/" xmlns:bar="http://thirdnamespace.example.com/" xmlns:sta="http://fourthnamespace.example.com/" languageCode="en-US" releaseID="1.0" systemEnvironmentCode="PROD" versionID="1.0">
    <ApplicationArea>
        <Sender>
            <SenderCode>4457</SenderCode>
        </Sender>
    </ApplicationArea>
    <DataArea>
        <Order>
            <foo:Item>
                <foo:Price>
                    <foo:AmountPerUnit currencyID="USD">58000.000000</foo:AmountPerUnit>
                    <foo:TotalAmount currencyID="USD">58000.000000</foo:TotalAmount>
                </foo:Price>
                <foo:Description>
                    <foo:ItemCode>259601</foo:ItemCode>
                    <foo:ItemName>PORTAL GUN 6UBC BLUE</foo:ItemName>
                </foo:Description>
            </foo:Item>
            <bar:Supplier>
                <bar:SupplierID>4474</bar:SupplierID>
                <bar:SupplierName>APERTURE SCIENCE, INC</bar:SupplierName>
            </bar:Supplier>
            <sta:DeliveryLocation>
                <sta:RecipientID>103</sta:RecipientID>
                <sta:RecipientName>WARHOUSE 664</sta:RecipientName>
            </sta:DeliveryLocation>
        </Order>
    </DataArea>
</SampleOrder>

【问题讨论】：

在命名空间中搜索元素时，一种选择是使用通配符。示例：stackoverflow.com/a/61154644/407651、stackoverflow.com/a/62117710/407651
@mzjn 您提出的解决方案在大多数情况下都非常有效，谢谢。我编辑了问题以反映这一点。

标签： python xml elementtree xml-namespaces

【解决方案1】：

您应该在 xml_path 中指定命名空间，例如：./foo:DataArea/Order/Item/bar:Price。它与空命名空间一起工作的原因是因为它是默认的，你不必在你的路径中指定那个。

【讨论】：

【解决方案2】：

根据 Jan Jaap Meijerink 的回答和问题下的 mzjn 的 cmets，解决方案是在 XML 路径中插入命名空间前缀。这可以通过插入通配符 {*} 作为 mzjn 的评论和这个答案 (https://stackoverflow.com/a/62117710/407651) 来完成。

要记录解决方案，您可以将这个简单的操作添加到您的代码中：

xml_path = './DataArea/Order/Item/Price/TotalAmount'
xml_path_splitted_to_list = xml_path.split('/')
xml_path_with_wildcard_prefix = '/{*}'.join(xml_path_splitted_to_list)

如果有两个或多个节点具有相同的 XML 路径但不同的命名空间，findall() 方法（很自然）会访问所有这些元素节点。

【讨论】：