我怎样才能用更简单的方式写这个？答案

【问题标题】：How can I write this in a simpler way?我怎样才能用更简单的方式写这个？
【发布时间】：2020-02-17 14:11:07
【问题描述】：

我有一些 XML 行，我需要从中解析和提取反应和产品列表的物种名称，到目前为止，我已经尝试过以下几行，但我想知道是否有一种方法可以更清楚地做到这一点

XML：

<?xml version="1.0" encoding="UTF-8"?>
<sbml xmlns="data" level="2" version="1">
  <model id="E" name="core_model">
    <notes>
    <listOfUnitDefinitions>
    <listOfCompartments>
    <listOfSpecies>
    <listOfReactions>
      <reaction id="ID_1" name="name_1">
        <notes>
        <listOfReactants>
          <speciesReference species="react_1_1"/>
          <speciesReference species="react_2_1"/>
          <speciesReference species="react_3_1"/>
        </listOfReactants>
        <listOfProducts>
          <speciesReference species="produ_1_1"/>
          <speciesReference species="produ_2_1"/>
          <speciesReference species="produ_3_1"/>
        </listOfProducts>
        <kineticLaw>
      </reaction>
      <reaction id="ID_2" name="name_2">
        <notes>
        <listOfReactants>
          <speciesReference species="react_1_2"/>
        </listOfReactants>
        <listOfProducts>
          <speciesReference species="produ_1_2"/>
        </listOfProducts>
        <kineticLaw>
      </reaction>
      <reaction id="ID_3" name="name_3">
        <notes>
        <listOfReactants>
          <speciesReference species="react_1_3"/>
          <speciesReference species="react_2_3"/>
        </listOfReactants>
        <listOfProducts>
          <speciesReference species="produ_1_3"/>
          <speciesReference species="produ_2_3"/>
        </listOfProducts>
        <kineticLaw>
      </reaction>
    </listOfReactions>
  </model>
</sbml>

Python：

import xml.etree.ElementTree as et
tree = et.parse('example.xml')
root = tree.getroot()
child = root[0]

for x in child[4]: #to get the list of REACTIONS ids and names
    print (x.get('id'),':',x.get('name'))

for h in range(2): #gives back the list of species for reactants and products
    for i in range(2):
        for x in child[4][h][i+1]:
            print(x.get('species'))

打印：

react_1_1
react_2_1
react_3_1
produ_1_1
produ_2_1
produ_3_1
react_1_2
produ_1_2

期望的输出

ID_1
Reactants
react_1_1
react_2_1
react_3_1
Products
produ_1_1
produ_2_1
produ_3_1

ID_2
Reactions
react_1_2
Products
produ_1_2
.
.
.

使用 python 代码我可以解析和提取物种的名称，但输出是一个列表，不区分反应和产物，我也尝试过 element.iter() 但不成功

【问题讨论】：

使用 xpath：stackoverflow.com/questions/8692/how-to-use-xpath-in-python
我很确定 efficient 在这种情况下意味着更简单。但是，如果它是一个非常大的 xml 文件，并且如果您更关心 CPU/RAM 的使用，docs.python.org/3/library/xml.sax.reader.html 解析可能会有所帮助。
你的xml无效；你可以编辑你的问题来解决它吗？
我已经编辑了xml，非常感谢您的帮助

标签： python xml xml-parsing nested-loops

【解决方案1】：

另一种方法。

from simplified_scrapy import SimplifiedDoc,utils
html = utils.getFileContent('example.xml')
doc = SimplifiedDoc(html)
reactions = doc.listOfReactions.reactions
for reaction in reactions:
  print (reaction['id'],reaction['name']) # to get the list of REACTIONS ids and names
  # gives back the list of species for reactants and products
  print ('Reactants')
  print (reaction.selects('listOfReactants>speciesReference>species()'))
  print ('Products')
  print (reaction.selects('listOfProducts>speciesReference>species()'))

结果：

ID_1 name_1
Reactants
['react_1_1', 'react_2_1', 'react_3_1']
Products
['produ_1_1', 'produ_2_1', 'produ_3_1']
ID_2 name_2
Reactants
['react_1_2']
Products
['produ_1_2']
ID_3 name_3
Reactants
['react_1_3', 'react_2_3']
Products
['produ_1_3', 'produ_2_3']

【讨论】：