拆分和操作嵌套列表答案

【问题标题】：Splitting and manipulating nested lists拆分和操作嵌套列表
【发布时间】：2017-07-17 22:11:41
【问题描述】：

我正在尝试按组变量拆分嵌套列表。请考虑以下结构：

> str(L1)
List of 2
 $ names:List of 2
  ..$ first: chr [1:5] "john" "lisa" "anna" "mike" ...
  ..$ last : chr [1:5] "johnsson" "larsson" "johnsson" "catell" ...
 $ stats:List of 2
  ..$ physical:List of 2
  .. ..$ age   : num [1:5] 14 22 53 23 31
  .. ..$ height: num [1:5] 165 176 179 182 191
  ..$ mental  :List of 1
  .. ..$ iq: num [1:5] 102 104 99 87 121

现在我需要生成两个列表，它们都使用L1$names$last进行拼接，得到L2和L3，如下图：

L2：按L1$names$last分组的结果

> str(L2) 
List of 3
 $ johnsson:List of 2
  ..$ names:List of 1
  .. ..$ first: chr [1:2] "john" "anna"
  ..$ stats:List of 2
  .. ..$ physical:List of 2
  .. .. ..$ age   : num [1:2] 14 53
  .. .. ..$ height: num [1:2] 165 179
  .. ..$ mental  :List of 1
  .. .. ..$ iq: num [1:2] 102 99
 $ larsson :List of 2
  ..$ names:List of 1
  .. ..$ first: chr [1:2] "lisa" "steven"
  ..$ stats:List of 2
  .. ..$ physical:List of 2
  .. .. ..$ age   : num [1:2] 22 31
  .. .. ..$ height: num [1:2] 176 191
  .. ..$ mental  :List of 1
  .. .. ..$ iq: num [1:2] 104 121
 $ catell  :List of 2
  ..$ names:List of 1
  .. ..$ first: chr "mike"
  ..$ stats:List of 2
  .. ..$ physical:List of 2
  .. .. ..$ age   : num 23
  .. .. ..$ height: num 182
  .. ..$ mental  :List of 1
  .. .. ..$ iq: num 87

L3：每个组只允许出现一次L1$names$last

List of 2
 $ 1:List of 2
  ..$ names:List of 2
  .. ..$ first: chr [1:3] "john" "lisa" "mike"
  .. ..$ last : chr [1:3] "johnsson" "larsson" "catell"
  ..$ stats:List of 2
  .. ..$ physical:List of 2
  .. .. ..$ age   : num [1:3] 14 22 23
  .. .. ..$ height: num [1:3] 165 176 182
  .. ..$ mental  :List of 1
  .. .. ..$ iq: num [1:3] 102 104 87
 $ 2:List of 2
  ..$ names:List of 2
  .. ..$ first: chr [1:2] "anna" "steven"
  .. ..$ last : chr [1:2] "johnsson" "larsson"
  ..$ stats:List of 2
  .. ..$ physical:List of 2
  .. .. ..$ age   : num [1:2] 53 31
  .. .. ..$ height: num [1:2] 179 191
  .. ..$ mental  :List of 1
  .. .. ..$ iq: num [1:2] 99 121

我已尝试申请 this solution，但似乎这不适用于嵌套列表。

可重现的代码：

L1 <- list("names" = list("first" = c("john","lisa","anna","mike","steven"),"last" = c("johnsson","larsson","johnsson","catell","larsson")),"stats" = list("physical" = list("age" = c(14,22,53,23,31), "height" = c(165,176,179,182,191)), "mental" = list("iq" = c(102,104,99,87,121))))

L2 <- list("johnsson" = list("names" = list("first" = c("john","anna")),"stats" = list("physical" = list("age" = c(14,53), "height" = c(165,179)), "mental" = list("iq" = c(102,99)))), "larsson" = list("names" = list("first" = c("lisa","steven")),"stats" = list("physical" = list("age" = c(22,31), "height" = c(176,191)), "mental" = list("iq" = c(104,121)))), "catell" = list("names" = list("first" = "mike"),"stats" = list("physical" = list("age" = 23, "height" = 182), "mental" = list("iq" = 87))))

L3 <- list("1" = list("names" = list("first" = c("john","lisa","mike"),"last" = c("johnsson","larsson","catell")),"stats" = list("physical" = list("age" = c(14,22,23), "height" = c(165,176,182)), "mental" = list("iq" = c(102,104,87)))), "2" = list("names" = list("first" = c("anna","steven"),"last" = c("johnsson","larsson")),"stats" = list("physical" = list("age" = c(53,31), "height" = c(179,191)), "mental" = list("iq" = c(99,121)))))

编辑：请注意，实际数据集比提供的示例更大且嵌套更深。

【问题讨论】：

你的数据看起来很结构化，即矩形，你为什么不使用数据框
我在创建示例数据时没有考虑到这一点。我正在使用的实际数据动态变化，不一定是矩形。
您能否提供一个示例，其中非列表向量并非都具有相同的长度？以及想要的最终结果？

标签： r split

【解决方案1】：

通常要使用递归来修改列表。例如，考虑这个函数：

foo <- function(x, idx) {

    if (is.list(x)) {
        return(lapply(x, foo, idx = idx))
    }
    return(x[idx])
}

它需要一些列表作为x 和一些索引idx。它将检查x 是否是一个列表，如果是这种情况，它会将自身应用于列表的所有子元素。一旦x 不再是一个列表，我们就会采用idx 给出的元素。在整个过程中，原始列表的结构将保持不变。

这里有一个完整的例子。请注意，此代码假定列表中的所有向量都有 5 个元素。

L1 <- list("names" = list("first" = c("john","lisa","anna","mike","steven"),"last" = c("johnsson","larsson","johnsson","catell","larsson")),"stats" = list("physical" = list("age" = c(14,22,53,23,31), "height" = c(165,176,179,182,191)), "mental" = list("iq" = c(102,104,99,87,121))))

L2 <- list("johnsson" = list("names" = list("first" = c("john","anna")),"stats" = list("physical" = list("age" = c(14,53), "height" = c(165,179)), "mental" = list("iq" = c(102,99)))), "larsson" = list("names" = list("first" = c("lisa","steven")),"stats" = list("physical" = list("age" = c(22,31), "height" = c(176,191)), "mental" = list("iq" = c(104,121)))), "catell" = list("names" = list("first" = "mike"),"stats" = list("physical" = list("age" = 23, "height" = 182), "mental" = list("iq" = 87))))

L3 <- list("1" = list("names" = list("first" = c("john","lisa","mike"),"last" = c("johnsson","larsson","catell")),"stats" = list("physical" = list("age" = c(14,22,23), "height" = c(165,176,182)), "mental" = list("iq" = c(102,104,87)))), "2" = list("names" = list("first" = c("anna","steven"),"last" = c("johnsson","larsson")),"stats" = list("physical" = list("age" = c(53,31), "height" = c(179,191)), "mental" = list("iq" = c(99,121)))))

# make L2
foo <- function(x, idx) {

    if (is.list(x)) {
        return(lapply(x, foo, idx = idx))
    }
    return(x[idx])
}

levels <- unique(L1$names$last)
L2_2 <- vector("list", length(levels))
names(L2_2) <- levels
for (i in seq_along(L2_2)) {

    idx <- L1$names$last == names(L2_2[i])
    L2_2[[i]] <- list(names = foo(L1$names[-2], idx),
                      stats = foo(L1$stats, idx))

}
identical(L2, L2_2)

str(L2)
str(L2_2)

# make L3

dups <- duplicated(L1$names$last)
L3_2 <- vector("list", 2)
names(L3_2) <- 1:2
for (i in 1:2) {

    if (i == 1)
        idx <- !dups
    else
        idx <- dups

    L3_2[[i]] <- foo(L1, idx)

}
identical(L3, L3_2)
str(L3)
str(L3_2)

【讨论】：

非常感谢，您的解决方案适用于小型列表，但对于我的数据集（约 50 个变量的约 920 次观察），这是不可行的。
为什么不可行？时间？记忆？错误？

【解决方案2】：

这不是一个完整的答案，但我希望它有所帮助。

看看这是否适用于 L3：

x = data.frame(L1, stringsAsFactors = F)
y = x[order(x$names.last),]
y$seq = 1
y$seq = ifelse(y$names.last == shift(y$names.last),shift(y$seq)+1,1)
y$seq[1] = 1

z = list(list(names=list(first=z[[1]]$names.first, last=z[[1]]$names.last), stats=list(physical = list(age =z[[1]]$stats.physical.age, height= z[[1]]$stats.physical.height), mental=list(iq= z[[1]]$stats.iq))), list(names=list(first=z[[2]]$names.first, last=z[[2]]$names.last), stats=list(physical = list(age =z[[2]]$stats.physical.age, height= z[[2]]$stats.physical.height), mental=list(iq= z[[2]]$stats.iq))))

转换回列表的最后一部分 (z) 可以通过循环来完成。假设相同的名称不会出现太多，循环不会太慢。

你说它更嵌套，在这种情况下你需要添加is.null和或tryCatch函数来处理错误。

【讨论】：