【发布时间】:2017-08-07 17:43:03
【问题描述】:
这是我的 txt 文件:
In File Name: C:\Users\naqushab\desktop\files\File 1.m1
Out File Name: C:\Users\naqushab\desktop\files\Output\File 1.m2
In File Size: Low: 22636 High: 0
Total Process time: 1.859000
Out File Size: Low: 77619 High: 0
In File Name: C:\Users\naqushab\desktop\files\File 2.m1
Out File Name: C:\Users\naqushab\desktop\files\Output\File 2.m2
In File Size: Low: 20673 High: 0
Total Process time: 3.094000
Out File Size: Low: 94485 High: 0
In File Name: C:\Users\naqushab\desktop\files\File 3.m1
Out File Name: C:\Users\naqushab\desktop\files\Output\File 3.m2
In File Size: Low: 66859 High: 0
Total Process time: 3.516000
Out File Size: Low: 217268 High: 0
我正在尝试将其解析为这样的 XML 格式:
<?xml version='1.0' encoding='utf-8'?>
<root>
<filedata>
<InFileName>File 1.m1</InFileName>
<OutFileName>File 1.m2</OutFileName>
<InFileSize>22636</InFileSize>
<OutFileSize>77619</OutFileSize>
<ProcessTime>1.859000</ProcessTime>
</filedata>
<filedata>
<InFileName>File 2.m1</InFileName>
<OutFileName>File 2.m2</OutFileName>
<InFileSize>20673</InFileSize>
<OutFileSize>94485</OutFileSize>
<ProcessTime>3.094000</ProcessTime>
</filedata>
<filedata>
<InFileName>File 3.m1</InFileName>
<OutFileName>File 3.m2</OutFileName>
<InFileSize>66859</InFileSize>
<OutFileSize>217268</OutFileSize>
<ProcessTime>3.516000</ProcessTime>
</filedata>
</root>
这是我试图实现的代码(我使用的是 Python 2):
import re
import xml.etree.ElementTree as ET
rex = re.compile(r'''(?P<title>In File Name:
|Out File Name:
|In File Size: Low:
|Total Process time:
|Out File Size: Low:
)
(?P<value>.*)
''', re.VERBOSE)
root = ET.Element('root')
root.text = '\n' # newline before the celldata element
with open('Performance.txt') as f:
celldata = ET.SubElement(root, 'filedata')
celldata.text = '\n' # newline before the collected element
celldata.tail = '\n\n' # empty line after the celldata element
for line in f:
# Empty line starts new celldata element (hack style, uggly)
if line.isspace():
celldata = ET.SubElement(root, 'filedata')
celldata.text = '\n'
celldata.tail = '\n\n'
# If the line contains the wanted data, process it.
m = rex.search(line)
if m:
# Fix some problems with the title as it will be used
# as the tag name.
title = m.group('title')
title = title.replace('&', '')
title = title.replace(' ', '')
e = ET.SubElement(celldata, title.lower())
e.text = m.group('value')
e.tail = '\n'
# Display for debugging
ET.dump(root)
# Include the root element to the tree and write the tree
# to the file.
tree = ET.ElementTree(root)
tree.write('Performance.xml', encoding='utf-8', xml_declaration=True)
但我得到的是空值,是否可以将此 txt 解析为 XML?
【问题讨论】:
-
你在哪里得到空值?能不能说的清楚点!
-
如果一个完整的程序没有给出预期的结果,只需将其拆分成更小的部分,然后分别尝试。在这里,您应该首先简单地解析输入并打印您可以找到的部分。只有他们尝试构建 XML 文件。
-
而且,您的正则表达式和子元素名称不匹配!他们是故意的吗?
-
我尝试了这个程序,我得到了 XML 结构,而这也只是 filedata 标记。我帮助回答了一个 SO 问题,并根据我的结构更改了正则表达式..
-
@KeerthanaPrabhakaran 抱歉,我在将文本文件上传到 SO 之前正在对其进行编辑。我将更新我使用的正则表达式。不过,我认为它不正确。
标签: python xml python-2.7 parsing elementtree