【问题标题】:Parsing xml with etree用 etree 解析 xml
【发布时间】:2012-02-01 21:45:46
【问题描述】:

我正在尝试解析来自亚马逊产品广告 API 的 XML 响应,这是 xml

<?xml version="1.0" ?>
    <ItemLookupResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2010-11-01"> <OperationRequest>
        <HTTPHeaders>
            <Header Name="UserAgent" Value="TSN (Language=Python)"></Header>
        </HTTPHeaders>
        <RequestId>96ef9bc3-68a8-4bf3-a2c7-c98b8aeae00f</RequestId>
        <Arguments>
            <Argument Name="Operation" Value="ItemLookup"></Argument>
            <Argument Name="Service" Value="AWSECommerceService"></Argument>
            <Argument Name="Signature" Value="gjc4wRNum3YT82app1d06vMIDM7v44fOmZTP8Uh3LqE="></Argument><Argument Name="AssociateTag" Value="sneakick-20"></Argument>
            <Argument Name="Version" Value="2010-11-01"></Argument>
            <Argument Name="ItemId" Value="810056013349,810056013264"></Argument>
            <Argument Name="IdType" Value="UPC"></Argument>
            <Argument Name="AWSAccessKeyId" Value="AKIAIFMUMJLJOOINRVRA"></Argument>
            <Argument Name="Timestamp" Value="2012-01-03T21:26:39Z"></Argument>
            <Argument Name="ResponseGroup" Value="ItemIds"></Argument>
            <Argument Name="SearchIndex" Value="Apparel"></Argument>
        </Arguments>
       <RequestProcessingTime>0.0595830000000000</RequestProcessingTime>
      </OperationRequest>
      <Items>
          <Request>
              <IsValid>True</IsValid>
              <ItemLookupRequest>
                  <IdType>UPC</IdType>
                  <ItemId>810056013349</ItemId>
                  <ItemId>810056013264</ItemId>
                  <ResponseGroup>ItemIds</ResponseGroup>
                  <SearchIndex>Apparel</SearchIndex>
                  <VariationPage>All</VariationPage>
              </ItemLookupRequest>
          </Request>
          <Item>
              <ASIN>B000XR4K6U</ASIN>
          </Item>
          <Item>
              <ASIN>B000XR2UU8</ASIN>
          </Item>
       </Items>
    </ItemLookupResponse>

我感兴趣的是 Items 中的 Item 标签,所以基本上所有的 xml 都是亚马逊返回的一个字符串,我这样解析:

from xml.etree.ElementTree import fromstring

response = "xml string returned by amazon"
parsed = fromstring(response)
items = parsed[1] # This is how i get the Items element

# These were my attempts at getting the Item element
items.find('Item')
items.findall('Item')

items 是 Items 元素,但到目前为止没有成功,它一直返回 None/Empty ,我错过了一些东西,或者还有其他方法可以解决这个问题吗?

【问题讨论】:

  • 如果你能展示你的解析代码部分会很有帮助!

标签: python xml amazon-web-services elementtree


【解决方案1】:

命名空间问题。

您可以将命名空间放在所有项目的前面,如this questionthis question 的第一个答案中所述。一个可能更简单的解决方案是通过这样的快速破解来忽略命名空间:

xml_hacked_namespace = raw_xml.replace(' xmlsn=', ' xmlnamespace=')
doc = fromstring(xml_hacked_namespace)
item_list = doc.findall('.//Item')

如果您发现您正在使用 xml 进行大量工作,您可能也有兴趣查看 lxml。它速度更快,并提供了一些额外的方法,有些人觉得很好。

【讨论】:

    【解决方案2】:

    这是一个命名空间问题。这有效:

    from xml.etree import ElementTree as ET
    
    XML = """<?xml version="1.0" ?>
        <ItemLookupResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2010-11-01"> 
          <OperationRequest>
            <HTTPHeaders>
                <Header Name="UserAgent" Value="TSN (Language=Python)"></Header>
            </HTTPHeaders>
            <RequestId>96ef9bc3-68a8-4bf3-a2c7-c98b8aeae00f</RequestId>
            <Arguments>
                <Argument Name="Operation" Value="ItemLookup"></Argument>
                <Argument Name="Service" Value="AWSECommerceService"></Argument>
                <Argument Name="Signature" Value="gjc4wRNum3YT82app1d06vMIDM7v44fOmZTP8Uh3LqE="></Argument>
                <Argument Name="AssociateTag" Value="sneakick-20"></Argument>
                <Argument Name="Version" Value="2010-11-01"></Argument>
                <Argument Name="ItemId" Value="810056013349,810056013264"></Argument>
                <Argument Name="IdType" Value="UPC"></Argument>
                <Argument Name="AWSAccessKeyId" Value="AKIAIFMUMJLJOOINRVRA"></Argument>
                <Argument Name="Timestamp" Value="2012-01-03T21:26:39Z"></Argument>
                <Argument Name="ResponseGroup" Value="ItemIds"></Argument>
                <Argument Name="SearchIndex" Value="Apparel"></Argument>
            </Arguments>
           <RequestProcessingTime>0.0595830000000000</RequestProcessingTime>
          </OperationRequest>
          <Items>
              <Request>
                  <IsValid>True</IsValid>
                  <ItemLookupRequest>
                      <IdType>UPC</IdType>
                      <ItemId>810056013349</ItemId>
                      <ItemId>810056013264</ItemId>
                      <ResponseGroup>ItemIds</ResponseGroup>
                      <SearchIndex>Apparel</SearchIndex>
                      <VariationPage>All</VariationPage>
                  </ItemLookupRequest>
              </Request>
              <Item>
                  <ASIN>B000XR4K6U</ASIN>
              </Item>
              <Item>
                  <ASIN>B000XR2UU8</ASIN>
              </Item>
           </Items>
        </ItemLookupResponse>"""
    
    NS = "{http://webservices.amazon.com/AWSECommerceService/2010-11-01}"
    
    doc = ET.fromstring(XML)
    Item_elems = doc.findall(".//" + NS + "Item")  # All Item elements in document
    
    print Item_elems
    

    输出:

    [<Element '{http://webservices.amazon.com/AWSECommerceService/2010-11-01}Item' at 0xbf0c50>, 
    <Element '{http://webservices.amazon.com/AWSECommerceService/2010-11-01}Item' at 0xbf0cd0>]
    

    更接近您自己的代码的变体:

    NS = "{http://webservices.amazon.com/AWSECommerceService/2010-11-01}"
    doc = ET.fromstring(XML)
    items = doc[1]                           # Items element
    
    first_item = items.find(NS + 'Item')     # First direct Item child
    all_items =  items.findall(NS + 'Item')  # List of all direct Item children
    

    【讨论】:

    • 我希望我能支持这个 +10。只是因为最后一个代码示例。好例子不知何故很难找到。
    猜你喜欢
    • 2014-03-27
    • 2015-06-26
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-02-21
    • 2017-07-20
    • 1970-01-01
    • 2011-12-22
    相关资源
    最近更新 更多