【问题标题】:printing what is between two XML tags in python?打印python中两个XML标签之间的内容?
【发布时间】:2016-04-06 06:05:18
【问题描述】:

我正在使用元素树,例如 Ive 这个 XML 代码

<TEXT>
<PHRASE>
<CONJ>and</CONJ>
<V>came</V>
<en x='PERS'>Adam</en>
<PREP>from</PREP>
<en x='LOC'>Atlanta</en>
</PHRASE>
<PHRASE>
<en x='ORG'>Alpha</en>
<ADJ y='1'>Amazingly</ADJ>
<N>created by</N>
<en x='PERS'>John</en> 
</PHRASE> 
</TEXT>

当我在 en 标签中有 ORG="Alpha" 并且在另一个 en 标签中有 PERS="John" 时,我想要打印整个短语,我希望输出是“John Amazingly created by John”

我知道如何搜索 Alpha 和 John,但我的问题是打印两者之间的内容

for phrase in root.findall('./PHRASE'):
    ens = {en.get('x'): en.text for en in phrase.findall('en')}
    if 'ORG' in ens and 'PERS' in ens:
      if (ens["ORG"] =="Alpha" and ens["PERS"]=="John"):
          print("ORG is: {}, PERS is: {} /".format(ens["ORG"], ens["PERS"]))

但是我如何打印该短语中标记的其余文本。

【问题讨论】:

标签: python xml python-2.7 python-3.x


【解决方案1】:
import xml.etree.ElementTree as ET

xml = '''
<TEXT>
<PHRASE>
<CONJ>and</CONJ>
<V>came</V>
<en x='PERS'>Adam</en>
<PREP>from</PREP>
<en x='LOC'>Atlanta</en>
</PHRASE>
<PHRASE>
<en x='ORG'>Alpha</en>
<ADJ y='1'>Amazingly</ADJ>
<N>created by</N>
<en x='PERS'>John</en> 
</PHRASE> 
</TEXT>
'''

def section(seq, start, end):
  returning = False
  for item in seq:
    returning |= item == start
    if returning:
      yield item
    returning &= item != end

root = ET.fromstring(xml)
for phrase in root.findall('./PHRASE'):
    ens = {en.get('x'): en for en in phrase.findall('en')}
    if 'ORG' in ens and 'PERS' in ens:
      if (ens["ORG"].text =="Alpha" and ens["PERS"].text=="John"):
          print("ORG is: {}, PERS is: {} /".format(ens["ORG"].text, ens["PERS"].text))
          print(' '.join(el.text for el in section(phrase, ens["ORG"], ens["PERS"])))

【讨论】:

    【解决方案2】:

    很简单:

    import xml.etree.ElementTree as ET
    
    data = """<TEXT>
        <PHRASE>
            <CONJ>and</CONJ>
            <V>came</V>
            <en x='PERS'>Adam</en>
            <PREP>from</PREP>
            <en x='LOC'>Atlanta</en>
        </PHRASE>
        <PHRASE>
            <en x='ORG'>Alpha</en>
            <ADJ y='1'>Amazingly</ADJ>
            <N>created by</N>
            <en x='PERS'>John</en>
        </PHRASE>
    </TEXT>"""
    
    root = ET.fromstring(data)
    
    for node in root.findall('./PHRASE'):
        ens = [node.find('en[@x="ORG"]'), node.find('en[@x="PERS"]')]
    
        if all([i is not None for i in ens]):
            if 'Alpha' in ens[0].text and 'John' in ens[1].text:               
                print (" ".join(node.itertext()))
                # If you want remove eol (end of line chars) for each item:
                # " ".join([t.strip() for t in node.itertext()])
                break
    

    【讨论】:

      猜你喜欢
      • 2014-06-05
      • 2012-06-22
      • 2012-12-12
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多