【问题标题】:Parse xml file with python用python解析xml文件
【发布时间】:2021-11-15 18:08:36
【问题描述】:

我有这个简单的 xml 文件:

<BSB>
    <APPLSUMMARY>
        <MAIN W="S1" X="{ND}"/>
        <COUNTS Z="0" AB="0" BB="0" CB="0" DB="0" EB="0" FB="0" GB="{ND}"/>
        <SCOTDEBT OQB="{ND}"/>
        <NOTICES HB="0" IB="3"/>
        <SUB_BLOCKS C="3" D="3" E="1" F="0"/>
        <ALIAS_NO UPB="0" VPB="{ND}" WPB="0"/>
        <ASSOC_NO DD="0" ED="0" AC="0"/>
        <ALERTSUMM PB="0" QB="0" RB="{ND}" SB="{ND}" TB="{ND}" UB="{ND}"/>
        <HHOSUMM BC="{ND}" RGB="{ND}"/>
        <TPD INB="{ND}" JNB="{ND}" KNB="{ND}" LNB="{ND}"/>
        <OCCUPANCY AD="1"/>
        <DECEASED LQB="1" FCC="{ND}" GCC="{ND}" HCC="{ND}" ICC="{ND}"/>
        <IMPAIRED MQB="0"/>
        <ACTIVITY JCC="{ND}" KCC="{ND}" LCC="{ND}"/>
        <ADVERSE MCC="{ND}" HHC="{ND}"/>
    </APPLSUMMARY>
</BSB>

我想在 python 中创建一个 csv 文件,该文件只包含如下列中的已故内容:

所以,我正在尝试获取 DECEASED 位的值并将它们按列对齐。

我试过这个:

import xml.etree.ElementTree as ET
import io
parsed = objectify.parse(open(path)) // path is where the xml file is saved
root = parsed.getroot()
data = []



for elt in root.BSB.DECEASED:

    el_data = {}
    for child in elt.getchildren():
        el_data[child.tag] = child.text
        data.append(el_data)
    perf =pd.DataFrame(data).drop_duplicates(subset=None, keep='first', inplace=False)
    
    print(perf)
    perf.to_csv('DECESEAD.csv')

我得到一个空数据集:

空数据框 列: [] 索引:[]

谁能帮我获取 DECEASED 标签中的值吗?

【问题讨论】:

  • 这可能是一个错字:root.BSB.DECEASED?仅查看 XML 似乎应该是 root.BSB.APPLSUMMARY.DECEASED

标签: python xml parsing


【解决方案1】:

下面的代码收集你正在寻找的数据

import xml.etree.ElementTree as ET
from typing import Dict

xml = '''<BSB>
    <APPLSUMMARY>
        <MAIN W="S1" X="{ND}"/>
        <COUNTS Z="0" AB="0" BB="0" CB="0" DB="0" EB="0" FB="0" GB="{ND}"/>
        <SCOTDEBT OQB="{ND}"/>
        <NOTICES HB="0" IB="3"/>
        <SUB_BLOCKS C="3" D="3" E="1" F="0"/>
        <ALIAS_NO UPB="0" VPB="{ND}" WPB="0"/>
        <ASSOC_NO DD="0" ED="0" AC="0"/>
        <ALERTSUMM PB="0" QB="0" RB="{ND}" SB="{ND}" TB="{ND}" UB="{ND}"/>
        <HHOSUMM BC="{ND}" RGB="{ND}"/>
        <TPD INB="{ND}" JNB="{ND}" KNB="{ND}" LNB="{ND}"/>
        <OCCUPANCY AD="1"/>
        <DECEASED LQB="1" FCC="{ND}" GCC="{ND}" HCC="{ND}" ICC="{ND}"/>
        <IMPAIRED MQB="0"/>
        <ACTIVITY JCC="{ND}" KCC="{ND}" LCC="{ND}"/>
        <ADVERSE MCC="{ND}" HHC="{ND}"/>
    </APPLSUMMARY>
</BSB>'''


def _clean_dict(attributes: Dict) -> Dict:
    result = {}
    for k, v in attributes.items():
        if v[0] == '{':
            val = v[1:-1]
        else:
            val = v
        result[k] = val
    return result


data = []
root = ET.fromstring(xml)
for d in root.findall('.//DECEASED'):
    data.append(_clean_dict(d.attrib))
print(data)

输出(字典列表)

[{'LQB': '1', 'FCC': 'ND', 'GCC': 'ND', 'HCC': 'ND', 'ICC': 'ND'}]

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2021-01-20
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-04-05
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多