【问题标题】:aggregate data.frame with list column带有列表列的聚合 data.frame
【发布时间】:2016-09-12 18:57:06
【问题描述】:

data.frame 的每一行都有一个包含向量的列。我想聚合和组合向量。但是,似乎我不能用这种数据做到这一点。你将如何组合这些向量?

“错误:变量 'dv' 的类型(列表)无效”

#Problem: aggregate data.frame with list-column

#reproducible code
set.seed(1)
some_list <- replicate(40, sample(c(1:8), size=sample(1:6, 1), replace=TRUE))
exdf <- expand.grid(id=c(1:10), content=c(1:4))
exdf$dv <- some_list


#this throws error
aggregate(
formula=dv~id,
data=exdf,
FUN=c
)

【问题讨论】:

    标签: r


    【解决方案1】:

    您可以将dplyrunlistlist 一起使用:

    library(dplyr)
    df1 <- exdf %>% group_by(id) %>% summarise(dv = list(unlist(dv))) 
    
    df1
    # Source: local data frame [10 x 2]
    
    #      id         dv
    #   <int>     <list>
    #1      1 <int [13]>
    #2      2 <int [15]>
    #3      3 <int [13]>
    #4      4 <int [15]>
    #5      5 <int [13]>
    #6      6 <int [15]>
    #7      7 <int [13]>
    #8      8 <int [15]>
    #9      9 <int [13]>
    #10    10 <int [15]>
    
    df1$dv[[1]]
    # [1] 3 5 2 6 4 7 8 2 6 2 7 3 4
    

    或者data.table:

    library(data.table)
    setDT(exdf)[, .(list(unlist(dv))), id]
    
    #    id           V1
    # 1:  1 3,5,2,6,4,7,
    # 2:  2 2,8,8,6,6,1,
    # 3:  3 2,6,4,7,8,2,
    # 4:  4 7,4,6,4,1,4,
    # 5:  5 4,7,8,2,6,2,
    # 6:  6 4,1,4,2,7,6,
    # 7:  7 7,3,4,3,5,2,
    # 8:  8 4,2,7,6,2,8,
    # 9:  9 3,5,2,6,4,7,
    #10: 10 2,8,8,6,6,1,
    

    【讨论】:

      【解决方案2】:

      这里我们使用一些不可读的base R。如果使用dplyr管道让代码可读,不妨使用group/summarise

      data.frame(id = unique(exdf$id),
                 dv = cbind(lapply(split(exdf, exdf$id),
                                   function(x) unlist(x$dv))))
      
         id                                                      dv
      1   1                   3, 5, 6, 4, 7, 4, 2, 1, 6, 5, 5, 8, 5
      2   2    2, 8, 8, 6, 6, 1, 1, 7, 7, 4, 4, 7, 5, 5, 2, 3, 6, 4
      3   3                            2, 6, 5, 6, 3, 3, 8, 6, 6, 1
      4   4                7, 4, 6, 8, 3, 4, 2, 4, 5, 5, 3, 4, 5, 2
      5   5    4, 7, 8, 2, 6, 2, 6, 3, 5, 8, 6, 3, 4, 2, 1, 3, 2, 3
      6   6                      4, 1, 7, 1, 8, 6, 4, 7, 8, 4, 1, 3
      7   7                      7, 3, 4, 7, 3, 3, 4, 3, 6, 7, 7, 4
      8   8                4, 2, 7, 6, 8, 7, 4, 8, 4, 4, 2, 8, 6, 6
      9   9 1, 6, 4, 7, 6, 8, 4, 6, 4, 3, 4, 5, 2, 2, 5, 8, 3, 2, 8
      10 10    5, 5, 7, 1, 4, 2, 6, 1, 2, 2, 1, 1, 6, 8, 8, 2, 7, 6
      

      如果我们死心塌地使用“聚合”,您可以通过将数字列表更改为字符来执行以下操作。然后使用正则表达式提取这些数字。

      exdf$dv <- as.character(exdf$dv)
      aggregate(
        formula=dv~id,
        data=exdf,
        FUN = function(x) regmatches(paste0(x, collapse = ""),
                                     gregexpr('[0-9]', paste0(x, collapse = ""))))
      
         id                                                      dv
      1   1                   3, 5, 6, 4, 7, 4, 2, 1, 6, 5, 5, 8, 5
      2   2    2, 8, 8, 6, 6, 1, 1, 7, 7, 4, 4, 7, 5, 5, 2, 3, 6, 4
      3   3                            2, 6, 5, 6, 3, 3, 8, 6, 6, 1
      4   4                7, 4, 6, 8, 3, 4, 2, 4, 5, 5, 3, 4, 5, 2
      5   5    4, 7, 8, 2, 6, 2, 6, 3, 5, 8, 6, 3, 4, 2, 1, 3, 2, 3
      6   6                      4, 1, 7, 1, 8, 6, 4, 7, 8, 4, 1, 3
      7   7                      7, 3, 4, 7, 3, 3, 4, 3, 6, 7, 7, 4
      8   8                4, 2, 7, 6, 8, 7, 4, 8, 4, 4, 2, 8, 6, 6
      9   9 1, 6, 4, 7, 6, 8, 4, 6, 4, 3, 4, 5, 2, 2, 5, 8, 3, 2, 8
      10 10    5, 5, 7, 1, 4, 2, 6, 1, 2, 2, 1, 1, 6, 8, 8, 2, 7, 6
      

      【讨论】:

        猜你喜欢
        • 2021-01-08
        • 1970-01-01
        • 1970-01-01
        • 2017-09-02
        • 1970-01-01
        • 2013-09-26
        • 1970-01-01
        • 2020-09-19
        • 1970-01-01
        相关资源
        最近更新 更多