【问题标题】:parsing some XML fields to text file in python在python中将一些XML字段解析为文本文件
【发布时间】:2020-03-22 06:00:37
【问题描述】:

我正在尝试将 XML 文件解析为 txt 文件(主要是为了获取文本的正文),但是 for 循环不会运行,因此不会将结果附加到文件中,我知道我在XML 我试图创建一个外部 for 循环,它会在找到行为之前找到所有 MAEC_Bundle(我认为是因为它是根?)。

这是 XML 文件

<MAEC_Bundle xmlns:ns1="http://xml/metadataSharing.xsd" xmlns="http://maec.mitre.org/XMLSchema/maec-core-1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maec.mitre.org/XMLSchema/maec-core-1 file:MAEC_v1.1.xsd" id="maec:thug:bnd:1" schema_version="1.100000">
    <Analyses>
        <Analysis start_datetime="2019-11-25 21:41:59.491211" id="maec:thug:ana:2" analysis_method="Dynamic">
            <Tools_Used>
                <Tool id="maec:thug:tol:1">
                    <Name>Thug</Name>
                    <Version>0.9.40</Version>
                    <Organization>The Honeynet Project</Organization>
                </Tool>
            </Tools_Used>
        </Analysis>
    </Analyses>
    <Behaviors>
        <Behavior id="maec:thug:bhv:4">
            <Description>
                <Text>[window open redirection] about:blank -&gt; http://desbloquear.celularmovel.com/</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
        <Behavior id="maec:thug:bhv:5">
            <Description>
                <Text>[HTTP] URL: http://desbloquear.celularmovel.com/ (Status: 200, Referer: None)</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
        <Behavior id="maec:thug:bhv:6">
            <Description>
                <Text>[HTTP] URL: http://desbloquear.celularmovel.com/ (Content-type: text/html, MD5: f1fb042c62910c34be16ad91cbbd71fa)</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
        <Behavior id="maec:thug:bhv:7">
            <Description>
                <Text>[meta redirection] http://desbloquear.celularmovel.com/ -&gt; http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
        <Behavior id="maec:thug:bhv:8">
            <Description>
                <Text>[HTTP] URL: http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi (Status: 200, Referer: http://desbloquear.celularmovel.com/)</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
        <Behavior id="maec:thug:bhv:9">
            <Description>
                <Text>[HTTP] URL: http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi (Content-type: text/html, MD5: a28fe921afb898e60cc334e06f71f46e)</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
    </Behaviors>
    <Pools/>
</MAEC_Bundle>

这是python中解析的代码,下面的代码只对文件写操作,不进入循环

 import xml.etree.ElementTree as ET


def logsParsing():
    tree = ET.parse(
        'analysis.xml')
    root = tree.getroot()
    with open('sample1.txt', 'w') as f:
        f.write('Operation\n')
        with open('sample1.txt', 'a') as f:
            for behavior in root.findall('Behaviors'):
                operation = behavior.find('Behavior').find('Description').find('Text').text
                line_to_write = operation + '\n'
                f.write(line_to_write)
    f.close()


logsParsing()

【问题讨论】:

  • 为什么要打开文件两次?写入时,写入指针前进,下一次写入将从上一次结束的地方开始
  • 在进入作者模式之前,您需要致电f.close(),以便保存更改
  • 您的文件处理肯定很奇怪,但主要的错误可能是 findall 不能很好地与根中的命名空间配合使用。见stackoverflow.com/questions/14853243/…。我很想将其作为副本关闭。
  • 您需要考虑http://maec.mitre.org/XMLSchema/maec-core-1 命名空间。见docs.python.org/3/library/…

标签: python xml file parsing


【解决方案1】:

列表[Python 3.Docs]: xml.etree.ElementTree - The ElementTree XML API。您可能要坚持以下部分:

  • 使用命名空间解析 XML
  • XPath 支持

这是一种处理方式。

code00.py

#!/usr/bin/env python3

import sys
import xml.etree.ElementTree as ET


def main():
    tree = ET.parse("analysis.xml")
    root_node = tree.getroot()
    namespaces = {
        "xmlns": "http://maec.mitre.org/XMLSchema/maec-core-1",  # Namespace (default) from XML file (this is the only one we need, as tags that matter to us are not prefixed)
    }
    xpath = "./{0:s}:Behaviors/{0:s}:Behavior/{0:s}:Description/{0:s}:Text".format("xmlns")  # Compute each "Text" node full path
    print("Nodes to search: {0:s}".format(xpath))
    text_nodes = root_node.findall(xpath, namespaces)
    with open("sample1.txt", "w") as fout:  # Only open the out file once
        node_count = 0
        fout.write("Operation:\n")
        for text_node in text_nodes:
            fout.write(text_node.text + "\n")
            node_count += 1
        print("Wrote {0:d} nodes info.".format(node_count))


if __name__ == "__main__":
    print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    main()
    print("\nDone.")

输出

[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q059057339]> "e:\Work\Dev\VEnvs\py_064_03.07.03_test0\Scripts\python.exe" code00.py
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] 64bit on win32

Nodes to search: ./xmlns:Behaviors/xmlns:Behavior/xmlns:Description/xmlns:Text
Wrote 6 nodes info.

Done.

[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q059057339]> type sample1.txt
Operation:
[window open redirection] about:blank -> http://desbloquear.celularmovel.com/
[HTTP] URL: http://desbloquear.celularmovel.com/ (Status: 200, Referer: None)
[HTTP] URL: http://desbloquear.celularmovel.com/ (Content-type: text/html, MD5: f1fb042c62910c34be16ad91cbbd71fa)
[meta redirection] http://desbloquear.celularmovel.com/ -> http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi
[HTTP] URL: http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi (Status: 200, Referer: http://desbloquear.celularmovel.com/)
[HTTP] URL: http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi (Content-type: text/html, MD5: a28fe921afb898e60cc334e06f71f46e)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-01-01
    • 1970-01-01
    • 2018-03-08
    • 1970-01-01
    • 2011-07-28
    • 2018-04-02
    • 1970-01-01
    相关资源
    最近更新 更多