【问题标题】:Parse XML from Clinicaltrials.gov从 Clinicaltrials.gov 解析 XML
【发布时间】:2020-06-28 21:50:42
【问题描述】:

我正在尝试解析来自临床试验网站的主要终点。我对阅读 XML 文件有点陌生,我确定我搞砸了。

import requests
import xml.etree.ElementTree as etree

r = requests.get('https://clinicaltrials.gov/api/query/full_studies?expr=heart+attack')
root = etree.fromstring(r.content)

for child in root.iter('Field'):
    print(child.tag, child.attrib)

我明白了:

Field {'Name': 'StartDate'}
Field {'Name': 'StartDateType'}
Field {'Name': 'PrimaryCompletionDate'}
Field {'Name': 'PrimaryCompletionDateType'}
Field {'Name': 'PrimaryOutcomeMeasure'}
Field {'Name': 'PrimaryOutcomeDescription'}
Field {'Name': 'PrimaryOutcomeTimeFrame'}
Field {'Name': 'CompletionDate'}...

所以当我回去尝试时:

for child in root.iter('Field'):
    print(child.tag['Name'], child.attrib['PrimaryOutcomeMeasure'])

我收到以下错误:


KeyError                                  Traceback (most recent call last)
<ipython-input-62-f201e3c2a2b1> in <module>
      9 
     10 for child in root.iter('Field'):
---> 11     print(child.attrib['PrimaryOutcomeMeasure'])

KeyError: 'PrimaryOutcomeMeasure'

发生了什么事?

【问题讨论】:

  • 你看得到的文件了吗? “名称”确实是Field 元素的属性。而“PrimaryOutcomeMeasure”是该属性的
  • 你到底想在这里实现什么?
  • 您从该错误消息中了解到什么?你做过研究吗?
  • 但这就是我感到困惑的原因 - 我如何获得“PrimaryOutcomeMeasure”的值?此 XML 中的每项研究都应该有一个 PrimaryOutcomeMeasure:&lt;Struct Name="OutcomesModule"&gt; &lt;List Name="PrimaryOutcomeList"&gt; &lt;Struct Name="PrimaryOutcome"&gt; &lt;Field Name="PrimaryOutcomeMeasure"&gt;In-hospital mortality of the patients with acute myocardial infarction in different-level hospitals across China&lt;/Field&gt; &lt;/Struct&gt;

标签: python xml-parsing


【解决方案1】:

我建议你使用其他库编写代码,简单易读。

from simplified_scrapy import SimplifiedDoc, utils, req
xml = req.get(
    'https://clinicaltrials.gov/api/query/full_studies?expr=heart+attack'
)
doc = SimplifiedDoc(xml)
PrimaryOutcomeMeasure = doc.select('Field@Name="PrimaryOutcomeMeasure"')
print (PrimaryOutcomeMeasure.text)
# Or
PrimaryOutcome = doc.select('Struct@Name="PrimaryOutcome"')
print (PrimaryOutcome.select('Field@Name="PrimaryOutcomeMeasure"').text)

结果:

In-hospital mortality of the patients with acute myocardial infarction in different-level hospitals across China
In-hospital mortality of the patients with acute myocardial infarction in different-level hospitals across China

这里有更多示例:https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples

【讨论】:

    猜你喜欢
    • 2021-12-28
    • 2018-06-06
    • 2012-02-12
    • 1970-01-01
    • 2016-06-12
    • 2011-07-21
    • 1970-01-01
    • 2020-06-03
    • 2011-07-09
    相关资源
    最近更新 更多