数据框列表的不规则列表答案

【问题标题】：irregular list of lists to dataframe数据框列表的不规则列表
【发布时间】：2014-01-13 23:27:37
【问题描述】：

有问题。我需要将不规则的列表列表转换为宽格式的 data.frame（即我需要相同的行数），但我不知道该怎么做。列表看起来像这样：

[[1]]
[1] 14

[[2]]
[1] 26

[[3]]
[1] 20 21 22 23

[[4]]
[1] 21 22

[[5]]
[1] 25

[[6]]
[1] 17 21 23

我尝试了各种使用 for 循环和/或 sapply 的方法，但没有任何效果。不同长度的列表元素破坏了我所做的任何尝试。在我看来，必须有一种相当简单的方法来做到这一点。不应该有吗？谁能给点建议？

【问题讨论】：

你想如何填充缺失值？
NA 没问题，或者 "."

标签： r list dataframe

【解决方案1】：

这是一个lapply / mapply 示例...

#  Data
set.seed(1)
ll <- replicate( 4 , runif( sample(4,1) ) )
str(ll)
#List of 4
# $ : num [1:2] 0.372 0.573
# $ : num [1:4] 0.202 0.898 0.945 0.661
# $ : num [1:3] 0.0618 0.206 0.1766
# $ : num [1:3] 0.384 0.77 0.498

#  Find length of each list element
len <- sapply(ll,length)

#  Longest gives number of rows
n <- max( len )

#  Number of NAs to fill for column shorter than longest
len <- n - len

#  Output
mapply( function(x,y) c( x , rep( NA , y ) ) , ll , len )
#          [,1]      [,2]       [,3]      [,4]
#[1,] 0.3721239 0.2016819 0.06178627 0.3841037
#[2,] 0.5728534 0.8983897 0.20597457 0.7698414
#[3,]        NA 0.9446753 0.17655675 0.4976992
#[4,]        NA 0.6607978         NA        NA

注意，输出是一个矩阵，所以你需要用data.frame()包装输出。

逐行填充并返回一个data.frame

data.frame( t( mapply( function(x,y) c( x , rep( NA , y ) ) , ll , len ) ) )
#          X1        X2        X3        X4
#1 0.37212390 0.5728534        NA        NA
#2 0.20168193 0.8983897 0.9446753 0.6607978
#3 0.06178627 0.2059746 0.1765568        NA
#4 0.38410372 0.7698414 0.4976992        NA

【讨论】：

谢谢西蒙 - 看起来很不错！但是我怎样才能使矩阵填充行而不是列？
data.frame( t( mapply( function(x,y) c( x , rep( NA , y ) ) , ll , len ) ) )
非常感谢西蒙。工作一种享受....我不知道那句话中的魔法是如何起作用的！为什么中间有rep函数？！
另外，得到n后，你可以：as.data.frame(lapply(ll, [, 1:n))

【解决方案2】：

一种直接的方法是先将数据变成“长”形式（例如，使用“melt”），添加“times”变量，然后使用dcast 或reshape 获取数据回到一个新的“宽”形式。

这些示例使用@Simon 的回答中的ll：

这是一种“reshape2”方法：

library(reshape2)
ll2 <- melt(ll)
ll2$time <- ave(ll2$L1, ll2$L1, FUN = seq_along)
dcast(ll2, L1 ~ time, value.var="value")
#   L1          1         2         3         4
# 1  1 0.37212390 0.5728534        NA        NA
# 2  2 0.20168193 0.8983897 0.9446753 0.6607978
# 3  3 0.06178627 0.2059746 0.1765568        NA
# 4  4 0.38410372 0.7698414 0.4976992        NA

## Or, for the other orientation:
dcast(ll2, time ~ L1, value.var="value")

如果您使用的是至少 1.8.11 版本的包，您也可以为此使用“data.table”包

library(data.table)
library(reshape2)
packageVersion("data.table") ## Need at least V 1.8.11
# [1] ‘1.8.11’

DT <- data.table(ll)
DTL <- DT[, unlist(ll), by = 1:nrow(DT)]
DTL[, time := sequence(.N), by = nrow]
dcast.data.table(DTL, nrow ~ time, value.var="V1")
#    nrow          1         2         3         4
# 1:    1 0.37212390 0.5728534        NA        NA
# 2:    2 0.20168193 0.8983897 0.9446753 0.6607978
# 3:    3 0.06178627 0.2059746 0.1765568        NA
# 4:    4 0.38410372 0.7698414 0.4976992        NA

## Or, for the other orientation
dcast.data.table(DTL, time ~ nrow, value.var="V1")

这两者都具有额外的优势，可以方便地将NA 替换为您希望使用的任何其他内容。

【讨论】：

【解决方案3】：

另一种方法：

### set all lengths to maximum length (here=4)
### this will 'fill in' with NAs where needed
n <- 4
for (i in 1:length(ll)){
       length(ll[[i]]) <- n
}
matrix(unlist(ll), ncol=n)

### @Aruns approach is similar to the above;
### it uses the fact that subsetting by indices 
### which do not exist results in NAs e.g. 
> (ll[[1]][1:n])
[1] 0.3721239 0.5728534        NA        NA
### (using original `ll`)
d1 <- as.data.frame(lapply(ll, "[", 1:n))
colnames(d1) <- seq(4)
d1

### this is more roundabout
library(plyr)
### `ldply` takes list; returns data.frame
### default function applied is `rbind.fill`, which works here
### however `t` coerces this back to a matrix
### (using `ll` as modified by `for` loop in first answer)
t(ldply(ll))

       [,1]      [,2]       [,3]      [,4]
1 0.3721239 0.2016819 0.06178627 0.3841037
2 0.5728534 0.8983897 0.20597457 0.7698414
3        NA 0.9446753 0.17655675 0.4976992
4        NA 0.6607978         NA        NA

【讨论】：