打印python中两个XML标签之间的内容？答案

【问题标题】：printing what is between two XML tags in python?打印python中两个XML标签之间的内容？
【发布时间】：2016-04-06 06:05:18
【问题描述】：

我正在使用元素树，例如 Ive 这个 XML 代码

<TEXT>
<PHRASE>
<CONJ>and</CONJ>
<V>came</V>
<en x='PERS'>Adam</en>
<PREP>from</PREP>
<en x='LOC'>Atlanta</en>
</PHRASE>
<PHRASE>
<en x='ORG'>Alpha</en>
<ADJ y='1'>Amazingly</ADJ>
<N>created by</N>
<en x='PERS'>John</en> 
</PHRASE> 
</TEXT>

当我在 en 标签中有 ORG="Alpha" 并且在另一个 en 标签中有 PERS="John" 时，我想要打印整个短语，我希望输出是“John Amazingly created by John”

我知道如何搜索 Alpha 和 John，但我的问题是打印两者之间的内容

for phrase in root.findall('./PHRASE'):
    ens = {en.get('x'): en.text for en in phrase.findall('en')}
    if 'ORG' in ens and 'PERS' in ens:
      if (ens["ORG"] =="Alpha" and ens["PERS"]=="John"):
          print("ORG is: {}, PERS is: {} /".format(ens["ORG"], ens["PERS"]))

但是我如何打印该短语中标记的其余文本。

【问题讨论】：

This is probably relevant 或尝试查看BeautifulSoup

标签： python xml python-2.7 python-3.x

【解决方案1】：

import xml.etree.ElementTree as ET

xml = '''
<TEXT>
<PHRASE>
<CONJ>and</CONJ>
<V>came</V>
<en x='PERS'>Adam</en>
<PREP>from</PREP>
<en x='LOC'>Atlanta</en>
</PHRASE>
<PHRASE>
<en x='ORG'>Alpha</en>
<ADJ y='1'>Amazingly</ADJ>
<N>created by</N>
<en x='PERS'>John</en> 
</PHRASE> 
</TEXT>
'''

def section(seq, start, end):
  returning = False
  for item in seq:
    returning |= item == start
    if returning:
      yield item
    returning &= item != end

root = ET.fromstring(xml)
for phrase in root.findall('./PHRASE'):
    ens = {en.get('x'): en for en in phrase.findall('en')}
    if 'ORG' in ens and 'PERS' in ens:
      if (ens["ORG"].text =="Alpha" and ens["PERS"].text=="John"):
          print("ORG is: {}, PERS is: {} /".format(ens["ORG"].text, ens["PERS"].text))
          print(' '.join(el.text for el in section(phrase, ens["ORG"], ens["PERS"])))

【讨论】：

【解决方案2】：

很简单：

import xml.etree.ElementTree as ET

data = """<TEXT>
    <PHRASE>
        <CONJ>and</CONJ>
        <V>came</V>
        <en x='PERS'>Adam</en>
        <PREP>from</PREP>
        <en x='LOC'>Atlanta</en>
    </PHRASE>
    <PHRASE>
        <en x='ORG'>Alpha</en>
        <ADJ y='1'>Amazingly</ADJ>
        <N>created by</N>
        <en x='PERS'>John</en>
    </PHRASE>
</TEXT>"""

root = ET.fromstring(data)

for node in root.findall('./PHRASE'):
    ens = [node.find('en[@x="ORG"]'), node.find('en[@x="PERS"]')]

    if all([i is not None for i in ens]):
        if 'Alpha' in ens[0].text and 'John' in ens[1].text:               
            print (" ".join(node.itertext()))
            # If you want remove eol (end of line chars) for each item:
            # " ".join([t.strip() for t in node.itertext()])
            break

【讨论】：