【发布时间】:2020-03-22 06:00:37
【问题描述】:
我正在尝试将 XML 文件解析为 txt 文件(主要是为了获取文本的正文),但是 for 循环不会运行,因此不会将结果附加到文件中,我知道我在XML 我试图创建一个外部 for 循环,它会在找到行为之前找到所有 MAEC_Bundle(我认为是因为它是根?)。
这是 XML 文件
<MAEC_Bundle xmlns:ns1="http://xml/metadataSharing.xsd" xmlns="http://maec.mitre.org/XMLSchema/maec-core-1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maec.mitre.org/XMLSchema/maec-core-1 file:MAEC_v1.1.xsd" id="maec:thug:bnd:1" schema_version="1.100000">
<Analyses>
<Analysis start_datetime="2019-11-25 21:41:59.491211" id="maec:thug:ana:2" analysis_method="Dynamic">
<Tools_Used>
<Tool id="maec:thug:tol:1">
<Name>Thug</Name>
<Version>0.9.40</Version>
<Organization>The Honeynet Project</Organization>
</Tool>
</Tools_Used>
</Analysis>
</Analyses>
<Behaviors>
<Behavior id="maec:thug:bhv:4">
<Description>
<Text>[window open redirection] about:blank -> http://desbloquear.celularmovel.com/</Text>
</Description>
<Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
</Behavior>
<Behavior id="maec:thug:bhv:5">
<Description>
<Text>[HTTP] URL: http://desbloquear.celularmovel.com/ (Status: 200, Referer: None)</Text>
</Description>
<Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
</Behavior>
<Behavior id="maec:thug:bhv:6">
<Description>
<Text>[HTTP] URL: http://desbloquear.celularmovel.com/ (Content-type: text/html, MD5: f1fb042c62910c34be16ad91cbbd71fa)</Text>
</Description>
<Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
</Behavior>
<Behavior id="maec:thug:bhv:7">
<Description>
<Text>[meta redirection] http://desbloquear.celularmovel.com/ -> http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi</Text>
</Description>
<Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
</Behavior>
<Behavior id="maec:thug:bhv:8">
<Description>
<Text>[HTTP] URL: http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi (Status: 200, Referer: http://desbloquear.celularmovel.com/)</Text>
</Description>
<Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
</Behavior>
<Behavior id="maec:thug:bhv:9">
<Description>
<Text>[HTTP] URL: http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi (Content-type: text/html, MD5: a28fe921afb898e60cc334e06f71f46e)</Text>
</Description>
<Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
</Behavior>
</Behaviors>
<Pools/>
</MAEC_Bundle>
这是python中解析的代码,下面的代码只对文件写操作,不进入循环
import xml.etree.ElementTree as ET
def logsParsing():
tree = ET.parse(
'analysis.xml')
root = tree.getroot()
with open('sample1.txt', 'w') as f:
f.write('Operation\n')
with open('sample1.txt', 'a') as f:
for behavior in root.findall('Behaviors'):
operation = behavior.find('Behavior').find('Description').find('Text').text
line_to_write = operation + '\n'
f.write(line_to_write)
f.close()
logsParsing()
【问题讨论】:
-
为什么要打开文件两次?写入时,写入指针前进,下一次写入将从上一次结束的地方开始
-
在进入作者模式之前,您需要致电
f.close(),以便保存更改 -
您的文件处理肯定很奇怪,但主要的错误可能是 findall 不能很好地与根中的命名空间配合使用。见stackoverflow.com/questions/14853243/…。我很想将其作为副本关闭。
-
您需要考虑
http://maec.mitre.org/XMLSchema/maec-core-1命名空间。见docs.python.org/3/library/…。