【问题标题】:Split a nested list of a dataframe column into different columns将数据框列的嵌套列表拆分为不同的列
【发布时间】:2026-02-01 17:40:02
【问题描述】:

我已经尝试过相关的解决方案,但它们不适用于我的情况。我有一个数据框,在一列中有一个嵌套列表,我想拆分这个列表并将其放在列中。该列表包含另一个列表,其中包含每个月的时间戳(ts)和每个月的消耗量(v)。数据框是:

   id      monthly_consum
1 112          list1
2  34          list2
3  54          list3

在哪里

list1<-list(list(ts = "2016-01-01T00:00:00+01:00", v = 466.6),list(ts = "2016-02-01T00:00:00+01:00", v = 565.6),
                         list(ts = "2016-03-01T00:00:00+01:00", v = 765.6),list(ts = "2016-04-01T00:00:00+01:00", v = 888.6),
                         list(ts = "2016-05-01T00:00:00+01:00", v = 465),list(ts = "2016-06-01T00:00:00+01:00", v = 465.6),
                         list(ts = "2016-07-01T00:00:00+01:00", v = 786),list(ts = "2016-08-01T00:00:00+01:00", v = 435),
                         list(ts = "2016-09-01T00:00:00+01:00", v = 568),list(ts = "2016-10-01T00:00:00+01:00", v = 678),
                         list(ts = "2016-11-01T00:00:00+01:00", v = 522),list(ts = "2016- 12-01T00:00:00+01:00", v = 555))


list2<-list(list(ts = "2016-01-01T00:00:00+01:00", v = 333.6),list(ts = "2016-02-01T00:00:00+01:00", v = 565.6),
              list(ts = "2016-03-01T00:00:00+01:00", v = 765.6),list(ts = "2016-04-01T00:00:00+01:00", v = 333.6),
              list(ts = "2016-05-01T00:00:00+01:00", v = 465),list(ts = "2016-06-01T00:00:00+01:00", v = 465.6),
              list(ts = "2016-07-01T00:00:00+01:00", v = 786),list(ts = "2016-08-01T00:00:00+01:00", v = 435),
              list(ts = "2016-09-01T00:00:00+01:00", v = 568),list(ts = "2016-10-01T00:00:00+01:00", v = 678),
              list(ts = "2016-11-01T00:00:00+01:00", v = 522),list(ts = "2016-12-01T00:00:00+01:00", v = 555))


list3<-list(list(ts = "2016-01-01T00:00:00+01:00", v = 323.6),list(ts = "2016-02-01T00:00:00+01:00", v = 565.6),
           list(ts = "2016-03-01T00:00:00+01:00", v = 333.6),list(ts = "2016-04-01T00:00:00+01:00", v = 888.6),
           list(ts = "2016-05-01T00:00:00+01:00", v = 465),list(ts = "2016-06-01T00:00:00+01:00", v = 465.6),
           list(ts = "2016-07-01T00:00:00+01:00", v = 786),list(ts = "2016-08-01T00:00:00+01:00", v = 435),
           list(ts = "2016-09-01T00:00:00+01:00", v = 568),list(ts = "2016-10-01T00:00:00+01:00", v = 678),
           list(ts = "2016-11-01T00:00:00+01:00", v = 522),list(ts = "2016-12-01T00:00:00+01:00", v = 555))

我想拆分列表并创建一个具有以下两种格式之一的数据框:

   id          ts.1                     cons.1    ts.2    cons.2  ts.3 etc..
1 112      2016-01-01T00:00:00+01:00    466.6    2016-02..   ...   ...
2  34      2016-01-01T00:00:00+01:00    333.6    2016-02..   ...   ...
3  54      2016-01-01T00:00:00+01:00    323.6    2016-02..   ...   ...

  id             ts                  consumption    
 112      2016-01-01T00:00:00+01:00    466.6    
 112      2016-02-01T00:00:00+01:00    565.6    
 112      2016-03-01T00:00:00+01:00    765.6 
 112      2016-04-01T00:00:00+01:00    888.6    
 112      2016-05-01T00:00:00+01:00    465    
 112      2016-06-01T00:00:00+01:00    465.6 
 112      2016-07-01T00:00:00+01:00    786    
 112      2016-08-01T00:00:00+01:00    435    
 112      2016-09-01T00:00:00+01:00    568 
 112      2016-10-01T00:00:00+01:00    678    
 112      2016-11-01T00:00:00+01:00    522   
 112      2016-12-01T00:00:00+01:00    555 
 34       2016-01-01T00:00:00+01:00    466.6    
 34       2016-02-01T00:00:00+01:00    333.6    
 34       2016-03-01T00:00:00+01:00    323.6 
 etc............

你能帮帮我吗?我正在使用 data.frame(matrix(unlist..)) 但它没有给出我想要的格式。当我使用 rbind list 我得到:

"rbindlist(....) 中的错误: 列表输入的第 1 项不是 data.frame、data.table 或 list"

提前谢谢你!

更新 使用 dput 我会得到(在真正的问题中):

 >dput(locs_total[9:12,1:5])
     structure(list(X.dep_id. = c("34", "34", "34", "34"), X.loc_id. = c("17761", 
    "17406", "23591", "27838"), X.surface. = c("200", "1250", "54", 
    "150"), X.sector. = c("HOUSING", "SMALL-STORE-FOOD", "LIBRARY", 
    "OFFICE-BUILDING"), 
 X.avg_cons_main. = list(list(structure(list(
        ts = "2016-01-01T00:00:00+01:00", v = 466.65), .Names = c("ts", 
    "v")), structure(list(ts = "2016-02-01T00:00:00+01:00", v = 406.45), 
   .Names = c("ts", 
    "v")), structure(list(ts = "2016-03-01T00:00:00+01:00", v = 483.35), 
   .Names = c("ts", 
   "v")), structure(list(ts = "2016-04-01T00:00:00+02:00", v = 79.45), . 
   Names = c("ts", 
  "v"))), NULL, NULL, NULL)), .Names = c("X.dep_id.", "X.loc_id.", 
  "X.surface.", "X.sector.", "X.avg_cons_main."
 ), row.names = c("9", "10", "11", "12"), class = "data.frame")

【问题讨论】:

  • 请显示dput(x) 的输出,其中x 是数据框的适当缩减版本。
  • 你的dput 抛出错误
  • 我改了。有用吗?

标签: r list dataframe split


【解决方案1】:

我们可以遍历list

res <- do.call(rbind, Map(cbind, id = df1$id, lapply(mget(df1$monthly_consum), 
                   function(x) do.call(rbind.data.frame, x))))
names(res)[3] <- "consumption"
row.names(res) <- NULL
head(res, 14)
#    id                         ts consumption
#1  112  2016-01-01T00:00:00+01:00       466.6
#2  112  2016-02-01T00:00:00+01:00       565.6
#3  112  2016-03-01T00:00:00+01:00       765.6
#4  112  2016-04-01T00:00:00+01:00       888.6
#5  112  2016-05-01T00:00:00+01:00       465.0
#6  112  2016-06-01T00:00:00+01:00       465.6
#7  112  2016-07-01T00:00:00+01:00       786.0
#8  112  2016-08-01T00:00:00+01:00       435.0
#9  112  2016-09-01T00:00:00+01:00       568.0
#10 112  2016-10-01T00:00:00+01:00       678.0
#11 112  2016-11-01T00:00:00+01:00       522.0
#12 112 2016- 12-01T00:00:00+01:00       555.0
#13  34  2016-01-01T00:00:00+01:00       333.6
#14  34  2016-02-01T00:00:00+01:00       565.6

数据

df1 <- structure(list(id = c(112L, 34L, 54L), monthly_consum = c("list1", 
"list2", "list3")), .Names = c("id", "monthly_consum"), 
class = "data.frame", row.names = c("1", "2", "3"))

【讨论】:

  • 我正在尝试插入结构,但结果是包含 0 个对象的数据帧(我比我放的示例多行):df1
  • @Penelope 如果你在谈论我是如何得到它的,它来自dput函数
  • 好的,谢谢。我的问题是,如果我写 :res
  • @Penelope 我的解决方案基于您发布的示例。它对我有用
  • 谢谢@Akrun 我会再试一次:)
【解决方案2】:

如果id也在列表中,可以使用dplyr::bind_rows

dplyr::bind_rows(list1, list2, list3)
# A tibble: 36 × 2
                          ts     v
                       <chr> <dbl>
1  2016-01-01T00:00:00+01:00 466.6
2  2016-02-01T00:00:00+01:00 565.6
3  2016-03-01T00:00:00+01:00 765.6
4  2016-04-01T00:00:00+01:00 888.6
5  2016-05-01T00:00:00+01:00 465.0
6  2016-06-01T00:00:00+01:00 465.6
7  2016-07-01T00:00:00+01:00 786.0
8  2016-08-01T00:00:00+01:00 435.0
9  2016-09-01T00:00:00+01:00 568.0
10 2016-10-01T00:00:00+01:00 678.0
# ... with 26 more rows

从另一个 df 添加 ID

library(dplyr)

ids <- data_frame(list_id = c(112, 34, 54),
                  monthly_consum = c("list1", "list2", "list3"))

如果我们考虑嵌套列表,您可以使用 purrr:map 如下:

-将三个列表合并为一个列表

k <- list(list1, list2, list3)

-使用 map 独立地在每一列中绑定_rows

k1 <- purrr:: map(k, bind_rows) 

-使用 id 作为列表的名称

names(k1) <- ids$list_id

-bind_rows 使用 .id

bind_rows(k1, .id = "id")

# A tibble: 36 × 3
      id                        ts     v
   <chr>                     <chr> <dbl>
1    112 2016-01-01T00:00:00+01:00 466.6
2    112 2016-02-01T00:00:00+01:00 565.6
3    112 2016-03-01T00:00:00+01:00 765.6
4    112 2016-04-01T00:00:00+01:00 888.6
5    112 2016-05-01T00:00:00+01:00 465.0
6    112 2016-06-01T00:00:00+01:00 465.6
7    112 2016-07-01T00:00:00+01:00 786.0
8    112 2016-08-01T00:00:00+01:00 435.0
9    112 2016-09-01T00:00:00+01:00 568.0
10   112 2016-10-01T00:00:00+01:00 678.0

【讨论】:

  • 你好,这些id不在消费列表中,你知道我该如何组合它们吗?