【发布时间】:2020-06-28 21:50:42
【问题描述】:
我正在尝试解析来自临床试验网站的主要终点。我对阅读 XML 文件有点陌生,我确定我搞砸了。
import requests
import xml.etree.ElementTree as etree
r = requests.get('https://clinicaltrials.gov/api/query/full_studies?expr=heart+attack')
root = etree.fromstring(r.content)
for child in root.iter('Field'):
print(child.tag, child.attrib)
我明白了:
Field {'Name': 'StartDate'}
Field {'Name': 'StartDateType'}
Field {'Name': 'PrimaryCompletionDate'}
Field {'Name': 'PrimaryCompletionDateType'}
Field {'Name': 'PrimaryOutcomeMeasure'}
Field {'Name': 'PrimaryOutcomeDescription'}
Field {'Name': 'PrimaryOutcomeTimeFrame'}
Field {'Name': 'CompletionDate'}...
所以当我回去尝试时:
for child in root.iter('Field'):
print(child.tag['Name'], child.attrib['PrimaryOutcomeMeasure'])
我收到以下错误:
KeyError Traceback (most recent call last)
<ipython-input-62-f201e3c2a2b1> in <module>
9
10 for child in root.iter('Field'):
---> 11 print(child.attrib['PrimaryOutcomeMeasure'])
KeyError: 'PrimaryOutcomeMeasure'
发生了什么事?
【问题讨论】:
-
你看得到的文件了吗? “名称”确实是
Field元素的属性。而“PrimaryOutcomeMeasure”是该属性的值。 -
你到底想在这里实现什么?
-
您从该错误消息中了解到什么?你做过研究吗?
-
但这就是我感到困惑的原因 - 我如何获得“PrimaryOutcomeMeasure”的值?此 XML 中的每项研究都应该有一个 PrimaryOutcomeMeasure:
<Struct Name="OutcomesModule"> <List Name="PrimaryOutcomeList"> <Struct Name="PrimaryOutcome"> <Field Name="PrimaryOutcomeMeasure">In-hospital mortality of the patients with acute myocardial infarction in different-level hospitals across China</Field> </Struct>
标签: python xml-parsing