【问题标题】:data.table: Group by, then aggregate with custom function returning several new columnsdata.table:分组,然后与返回几个新列的自定义函数聚合
【发布时间】:2019-06-17 09:04:55
【问题描述】:

在data.table中,我该怎么做:

  • 按组合的几列对表格进行分组
  • 然后将每个组交给一个自定义聚合函数,该函数:
  • 从组表子集中获取所有列,并通过返回几个将添加到表中的新列来聚合它们

这里的技巧是在不多次调用聚合函数的情况下生成几个新列。

例子:

library(data.table)
mtcars_dt <- data.table(mtcars)

returnsOneColumn <- function(dt_group_all_columns){
  "returned_value_1"
}

# works great, returns one new column as summary per group
mtcars_dt[,
          list( new_column_1 = returnsOneColumn(dt_group_all_columns= .SD) ),
          by = c("mpg", "cyl"),
          .SDcols = colnames(mtcars_dt)
          ]

returnsMultipleColumns <- function (dt_group_all_columns){
  list( "new_column_1" = "returned_value_1", 
        "new_column_2" = "returned_value_2"  )
}

# does not work: Ideally, I would like to have mpg, cyl, and several columns 
# generated from once calling returnsMultipleColumns
mtcars_dt[,
          list( returnsMultipleColumns(dt_group_all_columns = .SD) ),
          by = c("mpg", "cyl"),
          .SDcols = colnames(mtcars_dt)
          ]

# desired output should look like this
#
#     mpg cyl     new_column_1     new_column_2
# 1: 21.0   6 returned_value_1 returned_value_2
# 2: 22.8   4 returned_value_1 returned_value_2
# 3: 21.4   6 returned_value_1 returned_value_2
# 4: 18.7   8 returned_value_1 returned_value_2

相关:

Assign multiple columns using := in data.table, by group

【问题讨论】:

    标签: r data.table grouping aggregate summary


    【解决方案1】:

    您已经从函数返回了一个列表。您无需再次列出它们。所以删除list 并拥有如下代码

    mtcars_dt[,
               returnsMultipleColumns(dt_group_all_columns = .SD),
               by = c("mpg", "cyl"),
               .SDcols = colnames(mtcars_dt)
               ]
         mpg cyl     new_column_1     new_column_2
     1: 21.0   6 returned_value_1 returned_value_2
     2: 22.8   4 returned_value_1 returned_value_2
     3: 21.4   6 returned_value_1 returned_value_2
     4: 18.7   8 returned_value_1 returned_value_2
    

    【讨论】:

      猜你喜欢
      • 2019-03-21
      • 1970-01-01
      • 1970-01-01
      • 2021-06-25
      • 2016-06-29
      • 1970-01-01
      • 2021-06-29
      • 2013-06-15
      • 2018-08-17
      相关资源
      最近更新 更多