【问题标题】:Is this XML able to be parsed?这个 XML 可以解析吗?
【发布时间】:2020-10-29 08:40:23
【问题描述】:

我正在尝试解析 XML 响应,但没有成功。

我正在使用 python 请求库连接到返回 XML 的 API。

从 response.content 我得到:

{"GetQuestions":"<Questions><Question><QuestionId>393938<\/QuestionId><QuestionText>Please respond to the following statement:\"The assigned task was easy to complete\"<\/QuestionText><QuestionType>single<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393939<\/QuestionId><QuestionText>Did you save your  datafor later? Why\/why not?<\/QuestionText><QuestionType>text<\/QuestionType><QuestionStatus>1<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393940<\/QuestionId><QuestionText>Did you notice how much it cost to find the item? How much was it?<\/QuestionText><QuestionType>text<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393941<\/QuestionId><QuestionText>Did you select ‘signature on form’? Why\/why not?<\/QuestionText><QuestionType>text<\/QuestionType><QuestionStatus>1<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393942<\/QuestionId><QuestionText>Was it easy to find thethe new page? Why\/why not?<\/QuestionText><QuestionType>single<\/QuestionType><QuestionStatus>1<\/QuestionStatus><ExtendedType>4<\/ExtendedType><\/Question><Question><QuestionId>393943<\/QuestionId><QuestionText>Please enter your email. So that we can track your responses, we need you to provide this for each task.<\/QuestionText><QuestionShortCode>email<\/QuestionShortCode><QuestionType>text<\/QuestionType><QuestionStatus>1<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393944<\/QuestionId><QuestionText>Why didn't you save your  datafor later?<\/QuestionText><QuestionType>text<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393945<\/QuestionId><QuestionText>Why did you save your  datafor later?<\/QuestionText><QuestionType>single<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>4<\/ExtendedType><\/Question><Question><QuestionId>393946<\/QuestionId><QuestionText>Did you save your  datafor later?<\/QuestionText><QuestionType>single<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393947<\/QuestionId><QuestionText>Why didn't you select 'signature on form'?<\/QuestionText><QuestionType>text<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393948<\/QuestionId><QuestionText>Why did you select 'signature on form'?<\/QuestionText><QuestionType>text<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>4444449<\/QuestionId><QuestionText>Did you select ‘signature on form’?<\/QuestionText><QuestionType>single<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393950<\/QuestionId><QuestionText>Why wasn't it easy to find thethe new page?<\/QuestionText><QuestionType>single<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>4<\/ExtendedType><\/Question><Question><QuestionId>393951<\/QuestionId><QuestionText>Was it easy to find thethe new page?<\/QuestionText><QuestionType>single<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393952<\/QuestionId><QuestionText>Please enter your email addressSo that we can track your responses, we need you to provide this for each task<\/QuestionText><QuestionShortCode>email<\/QuestionShortCode><QuestionType>single<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>4<\/ExtendedType><\/Question><\/Questions>"}

如果我直接将它传递给 ElementTree :

ElementTree.fromstring(response.content)

返回:

xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 0

我从一开始就删除:{"GetQuestions":

我从最后删除:“}

仍然返回 xml.etree.ElementTree.ParseError。

我的方法有问题还是 XML 有问题?

任何建议将不胜感激。

【问题讨论】:

  • 问题是转义字符“\”。您可以尝试使用:etree.fromstring(d["GetQuestions"].replace("&lt;\\", "&lt;"))
  • I'm using the python requests libary to connect to an API that returns XML. - 这不是真的 - API 返回一个 json 字符串。 json里面有xml文件

标签: python xml python-requests xml-parsing


【解决方案1】:

您可以解析 xml,但您需要先进行一些小清理。

见下文

import xml.etree.ElementTree as ET
import requests 
# response is a dict with 1 entry
response = requests.get('api_url_goes_here').json()
# TODO - remove next line when you actually call the API
response = {"GetQuestions":"<Questions><Question><QuestionId>393938<\/QuestionId><QuestionText>Please respond to the following statement:\"The assigned task was easy to complete\"<\/QuestionText><QuestionType>single<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393939<\/QuestionId><QuestionText>Did you save your  datafor later? Why\/why not?<\/QuestionText><QuestionType>text<\/QuestionType><QuestionStatus>1<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393940<\/QuestionId><QuestionText>Did you notice how much it cost to find the item? How much was it?<\/QuestionText><QuestionType>text<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393941<\/QuestionId><QuestionText>Did you select ‘signature on form’? Why\/why not?<\/QuestionText><QuestionType>text<\/QuestionType><QuestionStatus>1<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393942<\/QuestionId><QuestionText>Was it easy to find thethe new page? Why\/why not?<\/QuestionText><QuestionType>single<\/QuestionType><QuestionStatus>1<\/QuestionStatus><ExtendedType>4<\/ExtendedType><\/Question><Question><QuestionId>393943<\/QuestionId><QuestionText>Please enter your email. So that we can track your responses, we need you to provide this for each task.<\/QuestionText><QuestionShortCode>email<\/QuestionShortCode><QuestionType>text<\/QuestionType><QuestionStatus>1<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393944<\/QuestionId><QuestionText>Why didn't you save your  datafor later?<\/QuestionText><QuestionType>text<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393945<\/QuestionId><QuestionText>Why did you save your  datafor later?<\/QuestionText><QuestionType>single<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>4<\/ExtendedType><\/Question><Question><QuestionId>393946<\/QuestionId><QuestionText>Did you save your  datafor later?<\/QuestionText><QuestionType>single<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393947<\/QuestionId><QuestionText>Why didn't you select 'signature on form'?<\/QuestionText><QuestionType>text<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393948<\/QuestionId><QuestionText>Why did you select 'signature on form'?<\/QuestionText><QuestionType>text<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>4444449<\/QuestionId><QuestionText>Did you select ‘signature on form’?<\/QuestionText><QuestionType>single<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393950<\/QuestionId><QuestionText>Why wasn't it easy to find thethe new page?<\/QuestionText><QuestionType>single<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>4<\/ExtendedType><\/Question><Question><QuestionId>393951<\/QuestionId><QuestionText>Was it easy to find thethe new page?<\/QuestionText><QuestionType>single<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>0<\/ExtendedType><\/Question><Question><QuestionId>393952<\/QuestionId><QuestionText>Please enter your email addressSo that we can track your responses, we need you to provide this for each task<\/QuestionText><QuestionShortCode>email<\/QuestionShortCode><QuestionType>single<\/QuestionType><QuestionStatus>0<\/QuestionStatus><ExtendedType>4<\/ExtendedType><\/Question><\/Questions>"}

# fetch the xml string and do a quick cleanup
xml = response['GetQuestions'].replace('<\/','</')
root = ET.fromstring(xml)
print(root)

输出

<Element 'Questions' at 0x7f35c68919f0>

【讨论】:

  • 嗨@balderman,它返回了根目录。我指定的是 response.text 而不是您建议的 response 。感谢您的帮助。
【解决方案2】:

response 中使用该 JSON 字符串:

xml.etree.ElementTree.fromstring(response.json()['GetQuestions'])

【讨论】:

  • 嗨@Joe,感谢您的建议,它返回一个关键错误:KeyError: 'GetQuestions'
  • 那么你的回答看起来不像你的问题。
【解决方案3】:

您可以使用 BeautifulSoup 来解析 xml 内容。当你创建一个变量时,你应该这样写:your_variable = BeautfilSoup(requests.text, features="xml" )。那应该对你有用。还要尝试验证您的代码,以便这一行成为完整的代码。一定有错误。然而,因为它是一行代码,所以很难找到它的位置。您可以访问Validator website

【讨论】:

  • 嗨@IvanCompSci2003 我试过你的建议。它返回:bs4.FeatureNotFound:找不到具有您请求的功能的树构建器:xml。需要安装解析器库吗?
  • 如何安装 lxml ( pip install lxml )
  • 还有可能是"\"字符有问题,所以试试.replace("\","")
  • from bs4 import BeautifulSoup response = your_data response = response["GetQuestions"].replace("/","") data = BeautifulSoup(response,features="xml") print(data) 输出它:&lt;?xml version="1.0" encoding="utf-8"?&gt; &lt;Questions&gt;&lt;Question&gt;&lt;QuestionId&gt;393938&lt;/QuestionId&gt;&lt;/Question&gt;&lt;/Questions&gt;
  • 嗨@balderman,谢谢你的建议。我试过了,它返回 xml =response.text['GetQuestions'].replace('
猜你喜欢
  • 1970-01-01
  • 2010-10-05
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2023-04-08
  • 1970-01-01
  • 1970-01-01
  • 2017-12-23
相关资源
最近更新 更多