R中具有多个变量的重组聚合答案

【问题标题】：restructured aggregation in R with multiple variablesR中具有多个变量的重组聚合
【发布时间】：2020-10-07 13:58:45
【问题描述】：

这里是mydata的一部分

mydat=structure(list(channel_id = c(219038L, 1755L, 1755L, 219038L, 
1755L, 1755L, 1755L, 1755L, 219038L, 1755L, 1755L, 1755L, 219038L, 
1755L, 1755L, 1755L), multifr_type = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), offer_category_id = c(718L, 
718L, 718L, 719L, 718L, 719L, 719L, 718L, 718L, 719L, 1616L, 
718L, 718L, 719L, 720L, 65L), adapter_id = c(3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 45L, 3L, 3L, 3L, 3L, 30L), adapter_id2 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
    airline1 = c(238L, 238L, 238L, 156L, 238L, 156L, 156L, 238L, 
    238L, 156L, 238L, 238L, 238L, 156L, 156L, 757L), airline2 = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
    ), meta_ui_type = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 1L, 1L), offer_flight_type_category_id = c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    ), discount_category_id = c(1L, 1L, 6L, 1L, 11L, 3L, 1L, 
    6L, 2L, 6L, 1L, 2L, 6L, 2L, 2L, 1L), flight_area = c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    ), count_sessions = c(15203L, 13297L, 12026L, 10575L, 10459L, 
    10306L, 9632L, 8623L, 7343L, 7298L, 6679L, 6236L, 4180L, 
    4163L, 3986L, 3923L), count_orders = c(907L, 3264L, 2400L, 
    426L, 1830L, 1787L, 1690L, 2119L, 501L, 1503L, 1942L, 1420L, 
    346L, 872L, 1100L, 474L), conversion = c(0.06, 0.245, 0.2, 
    0.04, 0.175, 0.173, 0.175, 0.246, 0.068, 0.206, 0.291, 0.228, 
    0.083, 0.209, 0.276, 0.121), offer_cat_id_clust = c(718L, 
    718L, 718L, 719L, 718L, 719L, 719L, 718L, 718L, 719L, 1616L, 
    718L, 718L, 719L, 720L, 65L)), class = "data.frame", row.names = c(NA, 
-16L))

我需要变量 channel_id+multifr_type+adapter_id+adapter_id2+airline1+airline2+meta_ui_type+offer_flight_type_category_id+discount_category_id+flight_area 计算变量 count_sessions,count_orders 的总和，并按每个类别计算 conversion 的平均值。然后将这种聚合的结果添加到mydat2 和offer_category_id,offer_cat_id_clust 设置类别 - “其他”

例如，为了更清楚，就拿这个类别进行聚合

channel_id  multifr_type    offer_category_id   adapter_id  adapter_id2 airline1    airline2    meta_ui_type    offer_flight_type_category_id   discount_category_id    flight_area count_sessions  count_orders    conversion  offer_cat_id_clust
219038  0   718 3   0   238 0   0   1   1   1   15203   907 0,06    718
219038  0   718 3   0   238 0   0   1   1   1   13297   3264    0,245   718

通过上述变量聚合 offer_category_id=718 的行，而不是 718 设置“其他” 因此，为 offer_category_id、offer_cat_id_clust 添加“其他”类别期望的结果看起来像

*channel_id multifr_type    offer_category_id   adapter_id  adapter_id2 airline1    airline2    meta_ui_type    offer_flight_type_category_id   discount_category_id    flight_area count_sessions  count_orders    conversion  offer_cat_id_clust
219038  0   other   3   0   238 0   0   1   1   1   28500   4171    0,152000    other*

我怎样才能做到这种重组聚合？

【问题讨论】：

标签： r dplyr data.table

【解决方案1】：

这样的？

library(dplyr)
df %>% 
  group_by(
    channel_id, multifr_type, adapter_id, 
    adapter_id2, airline1, airline2, 
    meta_ui_type, offer_flight_type_category_id, 
    discount_category_id, flight_area
  ) %>% 
  summarise(
    offer_category_id = "other",
    count_sessions = sum(count_sessions), 
    count_orders = sum(count_orders), 
    conversion = mean(conversion), 
    offer_cat_id_clust = "other"
  )

【讨论】：