【发布时间】:2019-06-18 06:04:27
【问题描述】:
我正在尝试构建一个脚本来读取 xml 文件。 这是我第一次解析 xml,我正在使用带有 xml.etree.ElementTree 的 python 进行解析。我要处理的文件部分如下所示:
<component>
<section>
<id root="42CB916B-BB58-44A0-B8D2-89B4B27F04DF" />
<code code="34089-3" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC" displayName="DESCRIPTION SECTION" />
<title mediaType="text/x-hl7-title+xml">DESCRIPTION</title>
<text>
<paragraph>Renese<sup>®</sup> is designated generically as polythiazide, and chemically as 2<content styleCode="italics">H</content>-1,2,4-Benzothiadiazine-7-sulfonamide, 6-chloro-3,4-dihydro-2-methyl-3-[[(2,2,2-trifluoroethyl)thio]methyl]-, 1,1-dioxide. It is a white crystalline substance, insoluble in water but readily soluble in alkaline solution.</paragraph>
<paragraph>Inert Ingredients: dibasic calcium phosphate; lactose; magnesium stearate; polyethylene glycol; sodium lauryl sulfate; starch; vanillin. The 2 mg tablets also contain: Yellow 6; Yellow 10.</paragraph>
</text>
<effectiveTime value="20051214" />
</section>
</component>
<component>
<section>
<id root="CF5D392D-F637-417C-810A-7F0B3773264F" />
<code code="42229-5" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC" displayName="SPL UNCLASSIFIED SECTION" />
<title mediaType="text/x-hl7-title+xml">ACTION</title>
<text>
<paragraph>The mechanism of action results in an interference with the renal tubular mechanism of electrolyte reabsorption. At maximal therapeutic dosage all thiazides are approximately equal in their diuretic potency. The mechanism whereby thiazides function in the control of hypertension is unknown.</paragraph>
</text>
<effectiveTime value="20051214" />
</section>
</component>
完整文件可从以下网址下载:
这是我的代码:
import xml.etree.ElementTree as ElementTree
import re
with open("ABD6ECF0-DC8E-41DE-89F2-1E36ED9D6535.xml") as f:
xmlstring = f.read()
# Remove the default namespace definition (xmlns="http://some/namespace")
xmlstring = re.sub(r'\sxmlns="[^"]+"', '', xmlstring, count=1)
tree = ElementTree.fromstring(xmlstring)
for title in tree.iter('title'):
print(title.text)
到目前为止,我可以打印标题,但我还想打印标签中捕获的相应文本。
我试过这个:
for title in tree.iter('title'):
print(title.text)
for paragraph in title.iter('paragraph'):
print(paragraph.text)
但是我没有从paragraph.text 输出
在做
for title in tree.iter('title'):
print(title.text)
for paragraph in tree.iter('paragraph'):
print(paragraph.text)
我打印段落的文本,但(显然)它是针对 xml 结构中找到的每个标题一起打印的。
我想找到一种方法来 1) 识别标题; 2) 打印相应的段落。 我该怎么做?
【问题讨论】:
标签: python xml xml-parsing elementtree xml.etree