【问题标题】:Add Text to an XML element in python3在 python3 中将文本添加到 XML 元素
【发布时间】:2021-10-14 13:34:09
【问题描述】:

我有一个 XML 文件:

<listOfSpecies>
  <species metaid="MAM00001c" sboTerm="SBO:0000247" id="MAM00001c" name="(-)-trans-carveol" compartment="c" initialConcentration="0" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false" fbc:charge="0" fbc:chemicalFormula="C10H16O">
    <annotation>
      <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
        <rdf:Description rdf:about="#MAM00001c">
          <bqbiol:is>
            <rdf:Bag>
              
            </rdf:Bag>
          </bqbiol:is>
        </rdf:Description>
      </rdf:RDF>
    </annotation>
  </species>
 ...
</listOfSpecies>

还有一个 txt 文件:

name="(-)-trans-carveol" fbc:charge="0" fbc:chemicalFormula="C10H16O"
<rdf:li rdf:resource="https://identifiers.org/kegg.compound/C11409"/>
<rdf:li rdf:resource="https://identifiers.org/pubchem.compound/94221"/>
<rdf:li rdf:resource="https://identifiers.org/lipidmaps/LMPR0102090005"/>
<rdf:li rdf:resource="https://identifiers.org/inchi/InChI=1S/C10H16O/c1-7(2)9-5-4-   8(3)10(11)6-9/h4,9-11H,1,5-6H2,2-3H3/t9-,10+/m0/s1"/>
<rdf:li rdf:resource="https://identifiers.org/inchikey/BAVONGHXFVOKBV-VHSXEESVSA-N"/>
<rdf:li rdf:resource="https://identifiers.org/metanetx.chemical/MNXM45735"/>

我想在 xml 文件中的 rdf:Bag 标记之间为每个物种/名称插入 txt 文件中的所有 'rdf:li rdf:resource' 元素。 到目前为止,我一直在使用 minidom、beautifulsoup、elementree 并将 xml 文件视为常规文件,但到目前为止我还没有找到任何可行的方法。谁能指出我正确的方向?

【问题讨论】:

    标签: python-3.x xml beautifulsoup elementtree minidom


    【解决方案1】:

    要将 txt 文件中的所有 rdf:li rdf:resource 元素添加到 &lt;rdf:Bag&gt; 标记下的 XML 文件中,您可以使用 zip() 遍历这两个文件并使用Tag.insert() 添加新标签。

    这是一个示例,您必须对其进行一些修改才能从文件而不是文档字符串中读取标签:

    from bs4 import BeautifulSoup
    
    
    xml = """
    <listOfSpecies>
      <species metaid="MAM00001c" sboTerm="SBO:0000247" id="MAM00001c" name="(-)-trans-carveol" compartment="c" initialConcentration="0" hasOnlySubstanceUnits="false" boundaryCondition="false" constant="false" fbc:charge="0" fbc:chemicalFormula="C10H16O">
        <annotation>
          <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:vCard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vCard4="http://www.w3.org/2006/vcard/ns#" xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/">
            <rdf:Description rdf:about="#MAM00001c">
              <bqbiol:is>
                <rdf:Bag>
                  
                </rdf:Bag>
              </bqbiol:is>
            </rdf:Description>
          </rdf:RDF>
          <rdf:Bag>
                  
                </rdf:Bag>
        </annotation>
      </species>
    </listOfSpecies>
    """
    
    txt = """
    name="(-)-trans-carveol" fbc:charge="0" fbc:chemicalFormula="C10H16O"
    <rdf:li rdf:resource="https://identifiers.org/kegg.compound/C11409"/>
    <rdf:li rdf:resource="https://identifiers.org/pubchem.compound/94221"/>
    <rdf:li rdf:resource="https://identifiers.org/lipidmaps/LMPR0102090005"/>
    <rdf:li rdf:resource="https://identifiers.org/inchi/InChI=1S/C10H16O/c1-7(2)9-5-4-   8(3)10(11)6-9/h4,9-11H,1,5-6H2,2-3H3/t9-,10+/m0/s1"/>
    <rdf:li rdf:resource="https://identifiers.org/inchikey/BAVONGHXFVOKBV-VHSXEESVSA-N"/>
    <rdf:li rdf:resource="https://identifiers.org/metanetx.chemical/MNXM45735"/>
    """
    
    xml_soup = BeautifulSoup(xml, "lxml")
    txt_soup = BeautifulSoup(txt, "lxml")
    
    for resource, bag in zip(txt_soup.find_all("rdf:li"), xml_soup.find_all("rdf:bag")):
        bag.insert(0, resource["rdf:resource"])
    
    print(xml_soup.prettify())
    

    输出:

    <listofspecies>
     <species boundarycondition="false" compartment="c" constant="false" fbc:charge="0" fbc:chemicalformula="C10H16O" hasonlysubstanceunits="false" id="MAM00001c" initialconcentration="0" metaid="MAM00001c" name="(-)-trans-carveol" sboterm="SBO:0000247">
      <annotation>
       <rdf:rdf xmlns:bqbiol="http://biomodels.net/biology-qualifiers/" xmlns:bqmodel="http://biomodels.net/model-qualifiers/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:vcard="http://www.w3.org/2001/vcard-rdf/3.0#" xmlns:vcard4="http://www.w3.org/2006/vcard/ns#">
        <rdf:description rdf:about="#MAM00001c">
         <bqbiol:is>
          <rdf:bag>
           https://identifiers.org/kegg.compound/C11409
          </rdf:bag>
         </bqbiol:is>
        </rdf:description>
       </rdf:rdf>
       <rdf:bag>
        https://identifiers.org/pubchem.compound/94221
       </rdf:bag>
      </annotation>
     </species>
    </listofspecies>
    

    【讨论】: