【发布时间】:2021-01-27 02:15:12
【问题描述】:
我想知道是否有一种方法可以解析 XML 并基本上获取所有标签(或尽可能多地)并将它们放入列中而无需硬编码。
例如我的 xml 中的 eventType 标签。我希望它最初创建一个名为“eventType”的列,并将值放在该列下方。它解析的每个“eventType”标签都会放在同一列中。
这是 XML 示例:
<?xml version="1.0" encoding="UTF-8"?>
<faults version="1" xmlns="urn:nortel:namespaces:mcp:faults" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:nortel:namespaces:mcp:faults NortelFaultSchema.xsd ">
<family longName="1OffMsgr" shortName="OOM"/>
<family longName="ACTAGENT" shortName="ACAT">
<logs>
<log>
<eventType>RES</eventType>
<number>1</number>
<severity>INFO</severity>
<descTemplate>
<msg>Accounting is enabled upon this NE.</msg>
</descTemplate>
<note>This log is generated when setting a Session Manager's AM from <none> to a valid AM.</note>
<om>On all instances of this Session Manager, the <NE_Inst>:<AM>:STD:acct OM row in the StdRecordStream group will appear and start counting the recording units sent to the configured AM.
On the configured AM, the <NE_inst>:acct OM rows in RECSTRMCOLL group will appear and start counting the recording units received from this Session Manager's instances.
</om>
</log>
<log>
<eventType>RES</eventType>
<number>2</number>
<severity>ALERT</severity>
<descTemplate>
<msg>Accounting is disabled upon this NE.</msg>
</descTemplate>
<note>This log is generated when setting a Session Manager's AM from a valid AM to <none>.</note>
<action>If you do not intend for the Session Manager to produce accounting records, then no action is required. If you do intend for the Session Manager to produce accounting records, then you should set the Session Manager's AM to a valid AM.</action>
<om>On all instances of this Session Manager, the <NE_Inst>:<AM>:STD:acct OM row in the StdRecordStream group that matched the previous datafilled AM will disappear.
On the previously configured AM, the <NE_inst>:acct OM rows in RECSTRMCOLL group will disappear.
</om>
</log>
</logs>
</family>
<family longName="ACODE" shortName="AC">
<alarms>
<alarm>
<eventType>ADMIN</eventType>
<number>1</number>
<probableCause>INFORMATION_MODIFICATION_DETECTED</probableCause>
<descTemplate>
<msg>Configured data for audiocode server updated: $1</msg>
<param>
<num>1</num>
<description>AudioCode configuration data got updated</description>
<exampleValue>acgwy1</exampleValue>
</param>
</descTemplate>
<manualClearable></manualClearable>
<correctiveAction>None. Acknowledge/Clear alarm and deploy the audiocode server if appropriate.</correctiveAction>
<alarmName>Audiocode Server Updated</alarmName>
<severities>
<severity>MINOR</severity>
</severities>
</alarm>
<alarm>
<eventType>ADMIN</eventType>
<number>2</number>
<probableCause>CONFIG_OR_CUSTOMIZATION_ERROR</probableCause>
<descTemplate>
<msg>Deployment for audiocode server failed: $1. Reason: $2.</msg>
<param>
<num>1</num>
<description>AudioCode Name</description>
<exampleValue>audcod</exampleValue>
</param>
<param>
<num>2</num>
<description>AudioCode Deployment failed reason</description>
<exampleValue>Failed to parse audiocode configuration data</exampleValue>
</param>
</descTemplate>
<manualClearable></manualClearable>
<correctiveAction>Check the configuration of audiocode server. Acknowledge/Clear alarm and deploy the audiocode server if appropriate.</correctiveAction>
<alarmName>Audiocode Server Deploy Failed</alarmName>
<severities>
<severity>MINOR</severity>
<severity>MAJOR</severity>
</severities>
</alarm>
<alarm>
<eventType>COMM</eventType>
<number>2</number>
<probableCause>LOSS_OF_FRAME</probableCause>
<descTemplate>
<msg>Far end LOF (a.k.a., Yellow Alarm). Trunk (DS1 Number): $1.</msg>
<param>
<num>1</num>
<description>Trunk Number of Trunk with configuration problem</description>
<exampleValue>2</exampleValue>
</param>
</descTemplate>
<clearCondition>Far end is correctly configured for proper framing.</clearCondition>
<correctiveAction>Check that the far end is configured for the proper framing.</correctiveAction>
<alarmName>Far end LOF</alarmName>
<severities>
<severity>CRITICAL</severity>
</severities>
<note>This alarm indicates the Trunk Framing settings on the connected PSTN switch do not match those provisioned on the Audiocodes Mediant 2k.</note>
</alarm>
<alarm>
<eventType>COMM</eventType>
<number>3</number>
<probableCause>LOSS_OF_FRAME</probableCause>
<descTemplate>
<msg>Near end sending LOF Indication. Trunk (DS1 Number): $1.</msg>
<param>
<num>1</num>
<description>Trunk Number of Trunk with configuration problem</description>
<exampleValue>2</exampleValue>
</param>
</descTemplate>
<clearCondition>Gateway is correctly configured for proper framing.</clearCondition>
<correctiveAction>Check that the Audiocodes gateway is configured for the proper framing.</correctiveAction>
<alarmName>Near end sending LOF Indication</alarmName>
<severities>
<severity>CRITICAL</severity>
</severities>
</alarm>
</alarms>
</family>
</faults>
这是代码,你可以看到我的标签名称是硬编码的:
from xml.etree import ElementTree
import csv
import lxml.etree
import pandas as pd
from copy import copy
from pprint import pprint
tree = ElementTree.parse('FaultFamilies.xml')
sitescope_data = open('Out.csv', 'w', newline='', encoding='utf-8')
csvwriter = csv.writer(sitescope_data)
# Create all needed columns here in order and writes them to excel file
col_names = ['longName', 'shortName', 'eventType', 'ProbableCause', 'Severity', 'alarmName', 'clearCondition',
'correctiveAction', 'note', 'action', 'om']
csvwriter.writerow(col_names)
def recurse(root, props):
# Finds every single tag in the xml file
for child in root:
#print(child.text)
if child.tag == '{urn:nortel:namespaces:mcp:faults}family':
# copy of the dictionary
p2 = copy(props)
# adds to the dictionary the longNm name and shortName
p2['longName'] = child.attrib.get('longName', '')
p2['shortName'] = child.attrib.get('shortName', '')
recurse(child, p2)
else:
recurse(child, props)
# FIND ALL NEEDED ALARMS INFORMATION
for event in root.findall('{urn:nortel:namespaces:mcp:faults}alarm'):
event_data = [props.get('longName',''), props.get('shortName', '')]
# Find eventType and appends it
event_id = event.find('{urn:nortel:namespaces:mcp:faults}eventType')
if event_id != None:
event_id = event_id.text
# appends to the to the list with comma
event_data.append(event_id)
# Find probableCause and appends it
probableCause = event.find('{urn:nortel:namespaces:mcp:faults}probableCause')
if probableCause != None:
probableCause = probableCause.text
event_data.append(probableCause)
# Find severities and appends it
severities = event.find('{urn:nortel:namespaces:mcp:faults}severities')
if severities:
severity_data = ','.join(
[sv.text for sv in severities.findall('{urn:nortel:namespaces:mcp:faults}severity')])
event_data.append(severity_data)
else:
event_data.append("")
# Find alarmName and appends it
alarmName = event.find('{urn:nortel:namespaces:mcp:faults}alarmName')
if alarmName != None:
alarmName = alarmName.text
event_data.append(alarmName)
clearCondition = event.find('{urn:nortel:namespaces:mcp:faults}clearCondition')
if clearCondition != None:
clearCondition = clearCondition.text
event_data.append(clearCondition)
correctiveAction = event.find('{urn:nortel:namespaces:mcp:faults}correctiveAction')
if correctiveAction != None:
correctiveAction = correctiveAction.text
event_data.append(correctiveAction)
note = event.find('{urn:nortel:namespaces:mcp:faults}note')
if note != None:
note = note.text
event_data.append(note)
action = event.find('{urn:nortel:namespaces:mcp:faults}action')
if action != None:
action = action.text
event_data.append(action)
csvwriter.writerow(event_data)
# FIND ALL LOGS INFORMATION
for event in root.findall('{urn:nortel:namespaces:mcp:faults}log'):
event_data = [props.get('longName', ''), props.get('shortName', '')]
event_id = event.find('{urn:nortel:namespaces:mcp:faults}eventType')
if event_id != None:
event_id = event_id.text
event_data.append(event_id)
probableCause = event.find('{urn:nortel:namespaces:mcp:faults}probableCause')
if probableCause != None:
probableCause = probableCause.text
event_data.append(probableCause)
severities = event.find('{urn:nortel:namespaces:mcp:faults}severity')
if severities != None:
severities = severities.text
event_data.append(severities)
alarmName = event.find('{urn:nortel:namespaces:mcp:faults}alarmName')
if alarmName != None:
alarmName = alarmName.text
event_data.append(alarmName)
# Find alarmName and appends it
clearCondition = event.find('{urn:nortel:namespaces:mcp:faults}clearCondition')
if clearCondition != None:
clearCondition = clearCondition.text
event_data.append(clearCondition)
correctiveAction = event.find('{urn:nortel:namespaces:mcp:faults}correctiveAction')
if correctiveAction != None:
correctiveAction = correctiveAction.text
event_data.append(correctiveAction)
note = event.find('{urn:nortel:namespaces:mcp:faults}note')
if note != None:
note = note.text
event_data.append(note)
action = event.find('{urn:nortel:namespaces:mcp:faults}action')
if action != None:
action = action.text
event_data.append(action)
csvwriter.writerow(event_data)
root = tree.getroot()
recurse(root, {}) # root + empty dictionary
print("File successfuly converted to CSV")
sitescope_data.close()
【问题讨论】:
-
你为什么用
alarmName复制粘贴所有这些块?您可以遍历它必须查找的名称,对吗? -
是的,这只是一个测试。如果硬编码是解析的唯一方法,我会确定它。我正在尝试找到一种无需硬编码即可将所有标签放入列的方法,因为此 xml 会随着新标签的变化而超时。
-
遗憾的是,我不熟悉您在这里使用的库,但我看到您正在对节点进行递归,我认为这足以保留您所有唯一值的
set()相遇对吧? -
啊,自从我昨晚查看并开始尝试编写解决方案以来,您已经完全改变了 XML。
-
嘿@barny 这是一个老问题。请看这个:stackoverflow.com/questions/64407201/…
标签: python parsing beautifulsoup lxml elementtree