响应头 Content-Type：application/xop+xml 和 lxml.etree.fromstring 解析答案

【问题标题】：Response header Content-Type: application/xop+xml and lxml.etree.fromstring parsing响应头 Content-Type：application/xop+xml 和 lxml.etree.fromstring 解析
【发布时间】：2019-05-16 02:10:02
【问题描述】：

我有一个来自 SOAP API 的响应，它的内容类型为：application/xop+xml。我不确定我可以多有效地使用 Response.text 让 lxml.etree.fromstring 使用 xml。

这是 Response.text

 --uuid:051145c9-9210-4e26-a390-d7cdd06b9f94
Content-Type: application/xop+xml; charset=UTF-8; type="text/xml"
Content-Transfer-Encoding: binary
Content-ID: <root.message@cxf.apache.org>

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><listResponse xmlns="http://www.strongmail.com/services/v2/schema"><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>101</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>102</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>103</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>107</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>108</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>109</id></objectId></listResponse></soap:Body></soap:Envelope>
--uuid:051145c9-9210-4e26-a390-d7cdd06b9f94--

获取 .text 并让 etree.fromstring 解析它

from lxml import etree
resXML = etree.fromstring(theResponse.text)

提供以下内容：

    resXML = etree.fromstring(theResponse.text)
  File "src/lxml/etree.pyx", line 3222, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1877, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1758, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1068, in lxml.etree._BaseParser._parseUnicodeDoc
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
  File "<string>", line 1
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1

我相信这是因为它期望 '

我查看了lxml.etree 文档https://lxml.de/tutorial.html#parsing-from-strings-and-files 并找到了.parse，但这仅适用于文件。查看 Response 的方法，我可以看到我可以获得有关标头的信息，例如内容类型，尽管文档继续使用 json，

Response 中是否有一些方法可以只提取不包括标题的 xml 部分，或者在 lxml.etree 中有一个方法吗？

【问题讨论】：

以 Response.text 开头的 XML 无效。您能否分享您的整个代码，以便我们了解发生了什么？
@DeveshKumarSingh 我使用 zeep 作为 SOAP 客户端，它将 SOAP 请求发送到 SOAP 服务。它有一个设置允许 zeep 不处理响应，而是一个常规的requests.Response 对象。请参阅zeep settings Docs 了解如何设置和在代码中实现的示例。是否仍需要此代码？

标签： python-3.x python-requests lxml

【解决方案1】：

你可以这样处理：

theResponse = [your response above]

from lxml import etree
from io import StringIO

parser = etree.HTMLParser()
tree   = etree.parse(StringIO(theResponse), parser)

从这一点开始，lxml 可以处理它。举一个随机的例子，如果你在响应中的链接之后，你可以尝试：

for i in tree.iter():
if len(i.values())>0:
       print(i.values()[0])

输出将是：

http://schemas.xmlsoap.org/soap/envelope/
http://www.strongmail.com/services/v2/schema
http://www.w3.org/2001/XMLSchema-instance

等等

【讨论】：

感谢您的帮助。我现在正在深入挖掘 lxml，我看到你的答案是如何进入树的。
@dbjock 很高兴它有帮助！