【问题标题】:xml flattening using python使用python扁平化xml
【发布时间】:2018-12-12 09:20:10
【问题描述】:

我正在尝试扁平化一个 xml 并写入一个 csv,以便它可以被 etl 进程使用。

<Answers>
      <AnswersList>
        <Entry key="qs_location_name" type="System.String">
          <value>Location Name</value>
        </Entry>
        <Entry key="qs_location_riskAddress1" type="System.String">
          <value>Risk Address 1</value>
        </Entry>
        <Entry key="qs_location_riskAddress2" type="System.String">
          <value>Risk Address 2</value>
        </Entry>
</AnswersList>
</Answers>

我的代码如下

from lxml import etree
from io import StringIO
tree = etree.parse(StringIO(xml_file))

root = tree.getroot().tag
for node in tree.iter():
    for child in node.getchildren():
        if child.text:
          if child.text.strip():
            print("{}.{} = {}".format(root, ".".join(tree.getelementpath(child).split("/")), child.text.strip()))

上面的代码提供了下面的输出。

AustraliaBizPackProposal.Answers.AnswersList.Entry[1].value = Location Name
AustraliaBizPackProposal.Answers.AnswersList.Entry[2].value = Risk Address 1
AustraliaBizPackProposal.Answers.AnswersList.Entry[3].value = Risk Address 2

我的预期输出是如下生成,请指教

AustraliaBizPackProposal.Answers.AnswersList.qs_location_name.value = Location Name
AustraliaBizPackProposal.Answers.AnswersList.qs_location_riskAddress1.value = Risk Address 1
AustraliaBizPackProposal.Answers.AnswersList.qs_location_riskAddress2.value = Risk Address 2

【问题讨论】:

    标签: python xml-parsing flatten


    【解决方案1】:

    此代码适用于该特定文件:

    root = tree.getroot().tag
    for node in tree.iter():
    for child in node.getchildren():
        if child.tag == 'Entry':            
            path = tree.getelementpath(child).split("/")[0]
            key = child.attrib['key']
            for val in child.getchildren(): 
                try:                    
                    print("{}.{}.{}.{} = {}".format(root, path, key, val.tag, val.text.strip()))
                except:
                    print("{}.{}.{}.{} = {}".format(root, path, key, val.tag, val.attrib['Text']))
    

    【讨论】:

    • 太棒了!请接受答案,以便您的问题被标记为已回答,其他人可以从中受益。
    • 以下子标签未得到解析。任何想法都一样。
    • 在这种情况下,子标签 将文本作为属性而不是子标签,代码已更新以处理这种情况。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2011-12-31
    • 1970-01-01
    • 1970-01-01
    • 2019-04-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多