【问题标题】:Splitting and manipulating nested lists拆分和操作嵌套列表
【发布时间】:2017-07-17 22:11:41
【问题描述】:

我正在尝试按组变量拆分嵌套列表。请考虑以下结构:

> str(L1)
List of 2
 $ names:List of 2
  ..$ first: chr [1:5] "john" "lisa" "anna" "mike" ...
  ..$ last : chr [1:5] "johnsson" "larsson" "johnsson" "catell" ...
 $ stats:List of 2
  ..$ physical:List of 2
  .. ..$ age   : num [1:5] 14 22 53 23 31
  .. ..$ height: num [1:5] 165 176 179 182 191
  ..$ mental  :List of 1
  .. ..$ iq: num [1:5] 102 104 99 87 121

现在我需要生成两个列表,它们都使用L1$names$last进行拼接,得到L2L3,如下图:

L2:按L1$names$last分组的结果

> str(L2) 
List of 3
 $ johnsson:List of 2
  ..$ names:List of 1
  .. ..$ first: chr [1:2] "john" "anna"
  ..$ stats:List of 2
  .. ..$ physical:List of 2
  .. .. ..$ age   : num [1:2] 14 53
  .. .. ..$ height: num [1:2] 165 179
  .. ..$ mental  :List of 1
  .. .. ..$ iq: num [1:2] 102 99
 $ larsson :List of 2
  ..$ names:List of 1
  .. ..$ first: chr [1:2] "lisa" "steven"
  ..$ stats:List of 2
  .. ..$ physical:List of 2
  .. .. ..$ age   : num [1:2] 22 31
  .. .. ..$ height: num [1:2] 176 191
  .. ..$ mental  :List of 1
  .. .. ..$ iq: num [1:2] 104 121
 $ catell  :List of 2
  ..$ names:List of 1
  .. ..$ first: chr "mike"
  ..$ stats:List of 2
  .. ..$ physical:List of 2
  .. .. ..$ age   : num 23
  .. .. ..$ height: num 182
  .. ..$ mental  :List of 1
  .. .. ..$ iq: num 87

L3:每个组只允许出现一次L1$names$last

List of 2
 $ 1:List of 2
  ..$ names:List of 2
  .. ..$ first: chr [1:3] "john" "lisa" "mike"
  .. ..$ last : chr [1:3] "johnsson" "larsson" "catell"
  ..$ stats:List of 2
  .. ..$ physical:List of 2
  .. .. ..$ age   : num [1:3] 14 22 23
  .. .. ..$ height: num [1:3] 165 176 182
  .. ..$ mental  :List of 1
  .. .. ..$ iq: num [1:3] 102 104 87
 $ 2:List of 2
  ..$ names:List of 2
  .. ..$ first: chr [1:2] "anna" "steven"
  .. ..$ last : chr [1:2] "johnsson" "larsson"
  ..$ stats:List of 2
  .. ..$ physical:List of 2
  .. .. ..$ age   : num [1:2] 53 31
  .. .. ..$ height: num [1:2] 179 191
  .. ..$ mental  :List of 1
  .. .. ..$ iq: num [1:2] 99 121

我已尝试申请 this solution,但似乎这不适用于嵌套列表。

可重现的代码:

L1 <- list("names" = list("first" = c("john","lisa","anna","mike","steven"),"last" = c("johnsson","larsson","johnsson","catell","larsson")),"stats" = list("physical" = list("age" = c(14,22,53,23,31), "height" = c(165,176,179,182,191)), "mental" = list("iq" = c(102,104,99,87,121))))

L2 <- list("johnsson" = list("names" = list("first" = c("john","anna")),"stats" = list("physical" = list("age" = c(14,53), "height" = c(165,179)), "mental" = list("iq" = c(102,99)))), "larsson" = list("names" = list("first" = c("lisa","steven")),"stats" = list("physical" = list("age" = c(22,31), "height" = c(176,191)), "mental" = list("iq" = c(104,121)))), "catell" = list("names" = list("first" = "mike"),"stats" = list("physical" = list("age" = 23, "height" = 182), "mental" = list("iq" = 87))))

L3 <- list("1" = list("names" = list("first" = c("john","lisa","mike"),"last" = c("johnsson","larsson","catell")),"stats" = list("physical" = list("age" = c(14,22,23), "height" = c(165,176,182)), "mental" = list("iq" = c(102,104,87)))), "2" = list("names" = list("first" = c("anna","steven"),"last" = c("johnsson","larsson")),"stats" = list("physical" = list("age" = c(53,31), "height" = c(179,191)), "mental" = list("iq" = c(99,121)))))

编辑:请注意,实际数据集比提供的示例更大且嵌套更深。

【问题讨论】:

  • 你的数据看起来很结构化,即矩形,你为什么不使用数据框
  • 我在创建示例数据时没有考虑到这一点。我正在使用的实际数据动态变化,不一定是矩形。
  • 您能否提供一个示例,其中非列表向量并非都具有相同的长度?以及想要的最终结果?

标签: r split


【解决方案1】:

通常要使用递归来修改列表。例如,考虑这个函数:

foo <- function(x, idx) {

    if (is.list(x)) {
        return(lapply(x, foo, idx = idx))
    }
    return(x[idx])
}

它需要一些列表作为x 和一些索引idx。它将检查x 是否是一个列表,如果是这种情况,它会将自身应用于列表的所有子元素。一旦x 不再是一个列表,我们就会采用idx 给出的元素。在整个过程中,原始列表的结构将保持不变。

这里有一个完整的例子。请注意,此代码假定列表中的所有向量都有 5 个元素。

L1 <- list("names" = list("first" = c("john","lisa","anna","mike","steven"),"last" = c("johnsson","larsson","johnsson","catell","larsson")),"stats" = list("physical" = list("age" = c(14,22,53,23,31), "height" = c(165,176,179,182,191)), "mental" = list("iq" = c(102,104,99,87,121))))

L2 <- list("johnsson" = list("names" = list("first" = c("john","anna")),"stats" = list("physical" = list("age" = c(14,53), "height" = c(165,179)), "mental" = list("iq" = c(102,99)))), "larsson" = list("names" = list("first" = c("lisa","steven")),"stats" = list("physical" = list("age" = c(22,31), "height" = c(176,191)), "mental" = list("iq" = c(104,121)))), "catell" = list("names" = list("first" = "mike"),"stats" = list("physical" = list("age" = 23, "height" = 182), "mental" = list("iq" = 87))))

L3 <- list("1" = list("names" = list("first" = c("john","lisa","mike"),"last" = c("johnsson","larsson","catell")),"stats" = list("physical" = list("age" = c(14,22,23), "height" = c(165,176,182)), "mental" = list("iq" = c(102,104,87)))), "2" = list("names" = list("first" = c("anna","steven"),"last" = c("johnsson","larsson")),"stats" = list("physical" = list("age" = c(53,31), "height" = c(179,191)), "mental" = list("iq" = c(99,121)))))

# make L2
foo <- function(x, idx) {

    if (is.list(x)) {
        return(lapply(x, foo, idx = idx))
    }
    return(x[idx])
}

levels <- unique(L1$names$last)
L2_2 <- vector("list", length(levels))
names(L2_2) <- levels
for (i in seq_along(L2_2)) {

    idx <- L1$names$last == names(L2_2[i])
    L2_2[[i]] <- list(names = foo(L1$names[-2], idx),
                      stats = foo(L1$stats, idx))

}
identical(L2, L2_2)

str(L2)
str(L2_2)

# make L3

dups <- duplicated(L1$names$last)
L3_2 <- vector("list", 2)
names(L3_2) <- 1:2
for (i in 1:2) {

    if (i == 1)
        idx <- !dups
    else
        idx <- dups

    L3_2[[i]] <- foo(L1, idx)

}
identical(L3, L3_2)
str(L3)
str(L3_2)

【讨论】:

  • 非常感谢,您的解决方案适用于小型列表,但对于我的数据集(约 50 个变量的约 920 次观察),这是不可行的。
  • 为什么不可行?时间?记忆?错误?
【解决方案2】:

这不是一个完整的答案,但我希望它有所帮助。

看看这是否适用于 L3:

x = data.frame(L1, stringsAsFactors = F)
y = x[order(x$names.last),]
y$seq = 1
y$seq = ifelse(y$names.last == shift(y$names.last),shift(y$seq)+1,1)
y$seq[1] = 1

z = list(list(names=list(first=z[[1]]$names.first, last=z[[1]]$names.last), stats=list(physical = list(age =z[[1]]$stats.physical.age, height= z[[1]]$stats.physical.height), mental=list(iq= z[[1]]$stats.iq))), list(names=list(first=z[[2]]$names.first, last=z[[2]]$names.last), stats=list(physical = list(age =z[[2]]$stats.physical.age, height= z[[2]]$stats.physical.height), mental=list(iq= z[[2]]$stats.iq))))

转换回列表的最后一部分 (z) 可以通过循环来完成。假设相同的名称不会出现太多,循环不会太慢。

你说它更嵌套,在这种情况下你需要添加is.null和或tryCatch函数来处理错误。

【讨论】:

    猜你喜欢
    • 2019-11-28
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-10-18
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多