【问题标题】:XML Parsing help Python lxml, etree, or domXML 解析帮助 Python lxml、etree 或 dom
【发布时间】:2017-07-20 02:44:42
【问题描述】:

我一直在尝试从库文档中解析 XML 响应,但无法确定找到我想要的值的简单方法。我将使用任何通用库。

字符串格式的示例 XML 响应:

<entry
       xmlns="http://www.w3.org/2005/Atom"
       xmlns:s="http://dev.splunk.com/ns/rest"
       xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
  <title>search index</title>
  <id>https://localhost:8089/services/search/jobs/mysearch_02151949</id>
  <updated>2011-07-07T20:49:58.000-07:00</updated>
  <link href="/services/search/jobs/mysearch_02151949" rel="alternate"/>
  <published>2011-07-07T20:49:57.000-07:00</published>
  <link href="/services/search/jobs/mysearch_02151949/search.log" rel="search.log"/>
  <link href="/services/search/jobs/mysearch_02151949/events" rel="events"/>
  <link href="/services/search/jobs/mysearch_02151949/results" rel="results"/>
  <link href="/services/search/jobs/mysearch_02151949/results_preview" rel="results_preview"/>
  <link href="/services/search/jobs/mysearch_02151949/timeline" rel="timeline"/>
  <link href="/services/search/jobs/mysearch_02151949/summary" rel="summary"/>
  <link href="/services/search/jobs/mysearch_02151949/control" rel="control"/>
  <author>
    <name>admin</name>
  </author>
  <content type="text/xml">
    <s:dict>
      <s:key name="cursorTime">1969-12-31T16:00:00.000-08:00</s:key>
      <s:key name="delegate"></s:key>
      <s:key name="diskUsage">2174976</s:key>
      <s:key name="dispatchState">DONE</s:key>
      <s:key name="doneProgress">1.00000</s:key>
      <s:key name="dropCount">0</s:key>
      <s:key name="earliestTime">2011-07-07T11:18:08.000-07:00</s:key>
      <s:key name="eventAvailableCount">287</s:key>
      <s:key name="eventCount">287</s:key>
      <s:key name="eventFieldCount">6</s:key>
      <s:key name="eventIsStreaming">1</s:key>
      <s:key name="eventIsTruncated">0</s:key>
      <s:key name="eventSearch">search index</s:key>
      <s:key name="eventSorting">desc</s:key>
      <s:key name="isDone">1</s:key>

我已经截断了输出,我想要的两个值是文本值:

  • name="isDone" (1)
  • name="doneProgress" (1.00000)
  • name="eventCount" (287)

如何找到这些数值?

【问题讨论】:

  • 你看过beautifulsoup4吗?我很幸运。例如:stackoverflow.com/questions/4071696/…
  • 我是 BS4 的忠实粉丝。我只是想要一个真正的 XML 库来完成这项工作,因为它与 XML-native 的 Splunk 集成。

标签: python xml lxml elementtree


【解决方案1】:

您可以使用lxmlxpath

ns = {'s':"http://dev.splunk.com/ns/rest"}
print xml.xpath("//s:key[@name='isDone']/text()", namespaces=ns)

这将打印[1]。完整示例:

xml = '''
<entry
       xmlns="http://www.w3.org/2005/Atom"
       xmlns:s="http://dev.splunk.com/ns/rest"
       xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
  <title>search index</title>
  <id>https://localhost:8089/services/search/jobs/mysearch_02151949</id>
  <updated>2011-07-07T20:49:58.000-07:00</updated>
  <link href="/services/search/jobs/mysearch_02151949" rel="alternate"/>
  <published>2011-07-07T20:49:57.000-07:00</published>
  <link href="/services/search/jobs/mysearch_02151949/search.log" rel="search.log"/>
  <link href="/services/search/jobs/mysearch_02151949/events" rel="events"/>
  <link href="/services/search/jobs/mysearch_02151949/results" rel="results"/>
  <link href="/services/search/jobs/mysearch_02151949/results_preview" rel="results_preview"/>
  <link href="/services/search/jobs/mysearch_02151949/timeline" rel="timeline"/>
  <link href="/services/search/jobs/mysearch_02151949/summary" rel="summary"/>
  <link href="/services/search/jobs/mysearch_02151949/control" rel="control"/>
  <author>
    <name>admin</name>
  </author>
  <content type="text/xml">
    <s:dict>
      <s:key name="cursorTime">1969-12-31T16:00:00.000-08:00</s:key>
      <s:key name="delegate"></s:key>
      <s:key name="diskUsage">2174976</s:key>
      <s:key name="dispatchState">DONE</s:key>
      <s:key name="doneProgress">1.00000</s:key>
      <s:key name="dropCount">0</s:key>
      <s:key name="earliestTime">2011-07-07T11:18:08.000-07:00</s:key>
      <s:key name="eventAvailableCount">287</s:key>
      <s:key name="eventCount">287</s:key>
      <s:key name="eventFieldCount">6</s:key>
      <s:key name="eventIsStreaming">1</s:key>
      <s:key name="eventIsTruncated">0</s:key>
      <s:key name="eventSearch">search index</s:key>
      <s:key name="eventSorting">desc</s:key>
      <s:key name="isDone">1</s:key>
    </s:dict>
  </content>
</entry>
'''

from lxml import etree
from cStringIO import StringIO

xml = StringIO(xml)
xml = etree.parse(xml)
ns = {'s':"http://dev.splunk.com/ns/rest"}
print xml.xpath("//s:key[@name='isDone']/text()", namespaces=ns)

【讨论】:

  • 我的源已经是字符串格式,所以我可以省略 from cStringIO import StringIO xml = StringIO(xml) 我试过使用这个:xml = etree.fromstring(xml) ns = {'s':"http://dev.splunk.com/ns/rest"} print xml.xpath("//s:key[@name='isDone']/text()", namespaces=ns) 但现在我得到 AttributeError : 'Element' 对象没有属性 xpath
  • 实际上,它奏效了。我的 AttributeError 问题是无关的。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2020-08-28
  • 2014-03-27
  • 2010-10-10
  • 2015-06-26
  • 1970-01-01
  • 2012-02-01
  • 1970-01-01
相关资源
最近更新 更多