【问题标题】:如何删除 XML 文件的一部分?
【发布时间】:2022-01-23 07:08:13
【问题描述】:

我需要删除 XML 文件的某些部分,例如这个文件:

<dict>
    <key>Images</key>
    <array>
        <dict>
            <key>ImageIndex</key>
            <integer>0</integer>
            <key>NumberOfROIs</key>
            <integer>42</integer>
            <key>ROIs</key>
            <array>
                <dict>
                    <key>Area</key>
                    <real>0.0</real>
                    <key>Center</key>
                    <string>(0.000000, 0.000000, 0.000000)</string>
                    <key>Dev</key>
                    <real>0.0</real>
                    <key>IndexInImage</key>
                    <integer>0</integer>
                    <key>Max</key>
                    <real>1358</real>
                    <key>Mean</key>
                    <real>1358</real>
                    <key>Min</key>
                    <real>1358</real>
                    <key>Name</key>
                    <string>Calcification</string>
                    <key>NumberOfPoints</key>
                    <integer>1</integer>
                    <key>Point_mm</key>
                    <array>
                        <string>(0.000000, 0.000000, 0.000000)</string>
                    </array>
                    <key>Point_px</key>
                    <array>
                        <string>(2964.620117, 3427.979980)</string>
                    </array>
                    <key>Total</key>
                    <real>1358</real>
                    <key>Type</key>
                    <integer>19</integer>
                </dict>
                <dict>
                    <key>Area</key>
                    <real>0.0</real>
                    <key>Center</key>
                    <string>(0.000000, 0.000000, 0.000000)</string>
                    <key>Dev</key>
                    <real>0.0</real>
                    <key>IndexInImage</key>
                    <integer>1</integer>
                    <key>Max</key>
                    <real>1401</real>
                    <key>Mean</key>
                    <real>1401</real>
                    <key>Min</key>
                    <real>1401</real>
                    <key>Name</key>
                    <string>Calcification</string>
                    <key>NumberOfPoints</key>
                    <integer>1</integer>
                    <key>Point_mm</key>
                    <array>
                        <string>(0.000000, 0.000000, 0.000000)</string>
                    </array>
                    <key>Point_px</key>
                    <array>
                        <string>(2993.159912, 3403.550049)</string>
                    </array>
                    <key>Total</key>
                    <real>1401</real>
                    <key>Type</key>
                    <integer>19</integer>
                </dict>
                <dict>
                    <key>Area</key>
                    <real>1.3665732145309448</real>
                    <key>Center</key>
                    <string>(0.000000, 0.000000, 0.000000)</string>
                    <key>Dev</key>
                    <real>66.487342834472656</real>
                    <key>IndexInImage</key>
                    <integer>36</integer>
                    <key>Max</key>
                    <real>1836</real>
                    <key>Mean</key>
                    <real>1583.29638671875</real>
                    <key>Min</key>
                    <real>1313</real>
                    <key>Name</key>
                    <string>Mass</string>
                    <key>NumberOfPoints</key>
                    <integer>89</integer>
                    <key>Point_mm</key>
                    <array>
                        <string>(0.000000, 0.000000, 0.000000)</string>
                        <string>(0.000000, 0.000000, 0.000000)</string>
                    </array>
                    <key>Point_px</key>
                    <array>
                        <string>(3196.290039, 1048.599976)</string>
                        <string>(3203.560059, 1046.170044)</string>
                        <string>(3211.330078, 1042.780029)</string>
                        <string>(3189.500000, 1050.540039)</string>
                    </array>
                    <key>Total</key>
                    <real>44457380</real>
                    <key>Type</key>
                    <integer>15</integer>
                </dict>
            </array>
        </dict>
    </array>
</dict>
</plist>  

我想删除 之间的所有内容,包括,其中有一个 钙化,换句话说,我只想要没有钙化的部分,我想要该文件的结果是:

<dict>
    <key>Images</key>
    <array>
        <dict>
            <key>ImageIndex</key>
            <integer>0</integer>
            <key>NumberOfROIs</key>
            <integer>42</integer>
            <key>ROIs</key>
            <array>
                <dict>
                    <key>Area</key>
                    <real>1.3665732145309448</real>
                    <key>Center</key>
                    <string>(0.000000, 0.000000, 0.000000)</string>
                    <key>Dev</key>
                    <real>66.487342834472656</real>
                    <key>IndexInImage</key>
                    <integer>36</integer>
                    <key>Max</key>
                    <real>1836</real>
                    <key>Mean</key>
                    <real>1583.29638671875</real>
                    <key>Min</key>
                    <real>1313</real>
                    <key>Name</key>
                    <string>Mass</string>
                    <key>NumberOfPoints</key>
                    <integer>89</integer>
                    <key>Point_mm</key>
                    <array>
                        <string>(0.000000, 0.000000, 0.000000)</string>
                        <string>(0.000000, 0.000000, 0.000000)</string>
                    </array>
                    <key>Point_px</key>
                    <array>
                        <string>(3196.290039, 1048.599976)</string>
                        <string>(3203.560059, 1046.170044)</string>
                        <string>(3211.330078, 1042.780029)</string>
                        <string>(3189.500000, 1050.540039)</string>
                    </array>
                    <key>Total</key>
                    <real>44457380</real>
                    <key>Type</key>
                    <integer>15</integer>
                </dict>
            </array>
        </dict>
    </array>
</dict>
</plist> 

这是我尝试过的:

data = r"C:\Users\vinc\Desktop\ExemploXML.xml"    
    
import xml.etree.ElementTree as ET
tree = ET.parse(data)
root = tree.getroot()
for e in root.findall(".//string"):
    if e.text == 'Calcification':
        
        print(e)
        root.remove(e)
    else:
        pass
tree.write(r'C:\Users\vinc\Desktop\out.xml')

结果=======================================

<Element 'string' at 0x000002B085002EA0>
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-43-d417d00038ed> in <module>
      8 
      9         print(e)
---> 10         root.remove(e)
     11     else:
     12         pass

ValueError: list.remove(x): x not in list

对于上下文,那些 XML 文件是语义分割信息,我想删除 Calcification 类注释。

【问题讨论】:

  • 通过 XSLT 很容易实现。你愿意吗?
  • 你想只删除具有&lt;key&gt;Area&lt;/key&gt;的字典项目吗?
  • 您已经非常接近了,请参阅下面的 Python/ETree 解决方案。
  • 另一种非常简单的方法是使用 lxml 而不是 elementtree。

标签: python xml


【解决方案1】:

这是基于 XSLT 的解决方案。

下面的 XSLT 遵循所谓的 Identity Transform 模式。

单行模板删除不需要的&lt;dict&gt; 元素:

<xsl:template match="dict[string='Calcification']"/>

How to transform an XML file using XSLT in Python?

XSLT

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="dict[string='Calcification']"/>
</xsl:stylesheet>

【讨论】:

  • 而在 XSLT 3.0 中它更简单,您可以将第一个模板规则替换为 &lt;xsl:mode on-no-match="shallow-copy"/&gt;
【解决方案2】:

上市[Python.Docs]: xml.etree.ElementTree - The ElementTree XML API

我总是喜欢通过 XPATH 搜索节点,并指定(尽可能多地)完整的节点。当然,缺点是如果 XML 结构发生变化,节点路径(在代码中)也要相应调整。

另外,作为一种通用模式(不知道是否适用于此),永远不要从您正在迭代的容器中删除元素。

我将您的源 XML 保存在 file00.xml 中(同时删除了最后一个(不匹配的)标签 (""强>))。

code00.py

#!/usr/bin/env python

import xml.etree.ElementTree as ET
import sys


def main(*argv):
    xml_file_name = "./file00.xml"
    tree = ET.parse(xml_file_name)
    root = tree.getroot()
    inner_array_nodes = root.findall("./array/dict/array")  # XPATH
    to_remove = []
    for parent_node in inner_array_nodes:
        for dict_node in parent_node:
            string_nodes = dict_node.findall("string")
            for string_node in string_nodes:
                if string_node.text == "Calcification":
                    to_remove.append((parent_node, dict_node))

    for parent, child in to_remove:
        parent.remove(child)

    print(b"".join(ET.tostringlist(root)).decode())


if __name__ == "__main__":
    print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                   64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print("\nDone.")
    sys.exit(rc)

输出

[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q070442605]> "e:\Work\Dev\VEnvs\py_pc064_03.08.07_test0\Scripts\python.exe" code00.py
Python 3.8.7 (tags/v3.8.7:6503f05, Dec 21 2020, 17:59:51) [MSC v.1928 64 bit (AMD64)] 064bit on win32

<dict>
    <key>Images</key>
    <array>
        <dict>
            <key>ImageIndex</key>
            <integer>0</integer>
            <key>NumberOfROIs</key>
            <integer>42</integer>
            <key>ROIs</key>
            <array>
                <dict>
                    <key>Area</key>
                    <real>1.3665732145309448</real>
                    <key>Center</key>
                    <string>(0.000000, 0.000000, 0.000000)</string>
                    <key>Dev</key>
                    <real>66.487342834472656</real>
                    <key>IndexInImage</key>
                    <integer>36</integer>
                    <key>Max</key>
                    <real>1836</real>
                    <key>Mean</key>
                    <real>1583.29638671875</real>
                    <key>Min</key>
                    <real>1313</real>
                    <key>Name</key>
                    <string>Mass</string>
                    <key>NumberOfPoints</key>
                    <integer>89</integer>
                    <key>Point_mm</key>
                    <array>
                        <string>(0.000000, 0.000000, 0.000000)</string>
                        <string>(0.000000, 0.000000, 0.000000)</string>
                    </array>
                    <key>Point_px</key>
                    <array>
                        <string>(3196.290039, 1048.599976)</string>
                        <string>(3203.560059, 1046.170044)</string>
                        <string>(3211.330078, 1042.780029)</string>
                        <string>(3189.500000, 1050.540039)</string>
                    </array>
                    <key>Total</key>
                    <real>44457380</real>
                    <key>Type</key>
                    <integer>15</integer>
                </dict>
            </array>
        </dict>
    </array>
</dict>

Done.

【讨论】:

    【解决方案3】:
    1. 您的 XML 有一个额外的 plist 标记。

    2. 您的代码即使确实有效,也只是尝试删除其中包含“Calcification”文本的 string 标记,而不是像您尝试的那样删除 dict。 p>

    3. 我在这里有一个可行的解决方案 - 可能不是最优化的代码,但可以肯定我只是根据您的输入尝试过

    import xml.etree.ElementTree as ET
    
    tree = ET.parse("sample.xml")
    root = tree.getroot()
    dict_list = []
    
    array = root.find("./array/dict/array")
    
    for each_dict in array.iter('dict'):
        for each_string in each_dict.iter('string'):
            if each_string.text == "Calcification":
                dict_list.append(each_dict)
    
    for each_dict in dict_list:
        array.remove(each_dict)
    
    tree.write('sample3.xml')
    

    【讨论】:

      猜你喜欢
      • 2012-06-01
      • 1970-01-01
      • 2017-09-16
      • 2010-11-06
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多