【发布时间】:2015-12-14 17:30:46
【问题描述】:
我想将 XML 文件转换为数据框。我找到了一些允许我读取 XML 数据的函数,但是我无法获得与初始 XML 文件具有相同结构的数据框(= 如果您在 Excel 中打开 XML 文件,您将获得的结构)。
这是我的原始 XML 代码:
<Data>
<Frame timestamp='17/09/2014 20:55:00.902' timecode='75299902' >
<Object type='Taxi' DISTANCE='3037' VOLUME='1668' id='15593' code='0' />
<Object type='Taxi' DISTANCE='3605' VOLUME='931' id='15603' code='4' />
<Object type='Bus' DISTANCE='3563' VOLUME='488' id='15604' code='9' />
<Object type='Taxi' DISTANCE='4942' VOLUME='57' id='15624' code='1' />
<Object type='Taxi' DISTANCE='784' VOLUME='47' id='15625' code='10' />
<Object type='Taxi' DISTANCE='3301' VOLUME='2041' id='15626' code='42' />
<Object type='Bus' DISTANCE='2040' VOLUME='2945' id='15630' code='27' />
<Object type='Airplane' DISTANCE='2865' VOLUME='2722' Z='0' />
</Frame>
<TrackingFrame timestamp='17/09/2014 20:54:59.771' timecode='75299771' >
<Object type='Taxi' DISTANCE='4941' VOLUME='51' id='15624' code='1' />
<Object type='Taxi' DISTANCE='789' VOLUME='47' id='15625' code='10' />
<Object type='Taxi' DISTANCE='3300' VOLUME='2069' id='15626' code='42' />
<Object type='Bus' DISTANCE='2027' VOLUME='2947' id='15630' code='27' />
<Object type='Airplane' DISTANCE='2865' VOLUME='2722' Z='0' />
</Frame>
</Data>
这让我已经获得了数据列表: 库(XML)
# Convert xml data to R
data <- xmlTreeParse(file="c:/R/CL/filename.xml",useInternalNode=TRUE)
# Create a list of the data
xl<-xmlToList(data)
理想情况下,我希望获得一个基于此 XML 数据的数据框,该数据框看起来与在 Excel 中输入 XML 数据时相同。但是,当我查看 xl 的输出时,我发现这是按对象和时间组织的。通常,当我在 Excel 中打开 XML 文件时,此信息是链接的(每个对象也有包含时间信息的列)
这是 xl
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "3037" "1668" "15593" "0"
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "3605" "931" "15603" "4"
$Frame$Object
type DISTANCE VOLUME id code
“Bus” "3563" "488" "15604" "9"
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "2161" "1592" "15615" "21"
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "4942" "57" "15624" "1"
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "784" "47" "15625" "10"
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "3301" "2041" "15626" "42"
$Frame$Object
type DISTANCE VOLUME id code
“Bus” "2040" "2945" "15630" "27"
$Frame$Object
type DISTANCE VOLUME Z
"Airplane" "2865" "2722" "0"
$Frame$Time
timestamp timecode
"17/09/2014 20:54:59.902" "75299902"
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "4941" "51" "15624" "1"
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "789" "47" "15625" "10"
$Frame$Object
type DISTANCE VOLUME id code
"Taxi" "3300" "2069" "15626" "42"
$Frame$Object
type DISTANCE VOLUME id code
“Bus” "2027" "2947" "15630" "27"
$Frame$Object
type DISTANCE VOLUME Z
"Airplane" "2865" "2722" "0"
$Frame$Time
timestamp timecode
"17/09/2014 20:54:59.771" "75299771"
此列表包含 2 个表结构/帧:Frame$Object 和 Frame$Time。我想将这两种结构组合成一个组合表(通过重复列时间戳和时间码以及每个对象的时间信息)。
在下面查看所需的输出(与在 Excel 中输入 XML 文件时的结构相同):
type DISTANCE VOLUME id code z timestamp timecode
Taxi 3037 1668 15593 0 17/09/2014 20:54:59.902 75299902
Taxi 3605 931 15603 4 17/09/2014 20:54:59.902 75299902
Bus 3563 488 15604 9 17/09/2014 20:54:59.900 75299902
Taxi 4942 57 15624 1 17/09/2014 20:54:59.900 75299902
Taxi 784 47 15625 10 17/09/2014 20:54:59.900 75299902
Taxi 3301 2041 15626 42 17/09/2014 20:54:59.900 75299902
Bus 2040 2945 15630 27 17/09/2014 20:54:59.900 75299902
Airplane 2865 2722 0 17/09/2014 20:54:59.900 75299902
Taxi 4941 51 15624 1 17/09/2014 20:54:59.771 75299771
Taxi 789 47 15625 10 17/09/2014 20:54:59.771 75299771
Taxi 3300 2069 15626 42 17/09/2014 20:54:59.771 75299771
Bus 2027 2947 15630 27 17/09/2014 20:54:59.771 75299771
Airplane 2865 2722 0 17/09/2014 20:54:59.771 75299771
哪些函数可以达到这个结果?预先感谢您的帮助!
【问题讨论】:
-
你试过xmlParse,伴随着getNodeSet/xpathApply吗?当您了解它的工作原理后,您可以使用 apply 将所有对象合并到一个数据框中。