【问题标题】:restructured aggregation in R with multiple variablesR中具有多个变量的重组聚合
【发布时间】:2020-10-07 13:58:45
【问题描述】:

这里是mydata的一部分

mydat=structure(list(channel_id = c(219038L, 1755L, 1755L, 219038L, 
1755L, 1755L, 1755L, 1755L, 219038L, 1755L, 1755L, 1755L, 219038L, 
1755L, 1755L, 1755L), multifr_type = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), offer_category_id = c(718L, 
718L, 718L, 719L, 718L, 719L, 719L, 718L, 718L, 719L, 1616L, 
718L, 718L, 719L, 720L, 65L), adapter_id = c(3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 45L, 3L, 3L, 3L, 3L, 30L), adapter_id2 = c(0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
    airline1 = c(238L, 238L, 238L, 156L, 238L, 156L, 156L, 238L, 
    238L, 156L, 238L, 238L, 238L, 156L, 156L, 757L), airline2 = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
    ), meta_ui_type = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 1L, 1L), offer_flight_type_category_id = c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    ), discount_category_id = c(1L, 1L, 6L, 1L, 11L, 3L, 1L, 
    6L, 2L, 6L, 1L, 2L, 6L, 2L, 2L, 1L), flight_area = c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    ), count_sessions = c(15203L, 13297L, 12026L, 10575L, 10459L, 
    10306L, 9632L, 8623L, 7343L, 7298L, 6679L, 6236L, 4180L, 
    4163L, 3986L, 3923L), count_orders = c(907L, 3264L, 2400L, 
    426L, 1830L, 1787L, 1690L, 2119L, 501L, 1503L, 1942L, 1420L, 
    346L, 872L, 1100L, 474L), conversion = c(0.06, 0.245, 0.2, 
    0.04, 0.175, 0.173, 0.175, 0.246, 0.068, 0.206, 0.291, 0.228, 
    0.083, 0.209, 0.276, 0.121), offer_cat_id_clust = c(718L, 
    718L, 718L, 719L, 718L, 719L, 719L, 718L, 718L, 719L, 1616L, 
    718L, 718L, 719L, 720L, 65L)), class = "data.frame", row.names = c(NA, 
-16L))

我需要变量 channel_id+multifr_type+adapter_id+adapter_id2+airline1+airline2+meta_ui_type+offer_flight_type_category_id+discount_category_id+flight_area 计算变量 count_sessions,count_orders 的总和,并按每个类别计算 conversion 的平均值。 然后将这种聚合的结果添加到mydat2offer_category_id,offer_cat_id_clust 设置类别 - “其他”

例如,为了更清楚,就拿这个类别进行聚合

channel_id  multifr_type    offer_category_id   adapter_id  adapter_id2 airline1    airline2    meta_ui_type    offer_flight_type_category_id   discount_category_id    flight_area count_sessions  count_orders    conversion  offer_cat_id_clust
219038  0   718 3   0   238 0   0   1   1   1   15203   907 0,06    718
219038  0   718 3   0   238 0   0   1   1   1   13297   3264    0,245   718

通过上述变量聚合 offer_category_id=718 的行,而不是 718 设置“其他” 因此,为 offer_category_id、offer_cat_id_clust 添加“其他”类别 期望的结果看起来像

*channel_id multifr_type    offer_category_id   adapter_id  adapter_id2 airline1    airline2    meta_ui_type    offer_flight_type_category_id   discount_category_id    flight_area count_sessions  count_orders    conversion  offer_cat_id_clust
219038  0   other   3   0   238 0   0   1   1   1   28500   4171    0,152000    other*

我怎样才能做到这种重组聚合?

【问题讨论】:

    标签: r dplyr data.table


    【解决方案1】:

    这样的?

    library(dplyr)
    df %>% 
      group_by(
        channel_id, multifr_type, adapter_id, 
        adapter_id2, airline1, airline2, 
        meta_ui_type, offer_flight_type_category_id, 
        discount_category_id, flight_area
      ) %>% 
      summarise(
        offer_category_id = "other",
        count_sessions = sum(count_sessions), 
        count_orders = sum(count_orders), 
        conversion = mean(conversion), 
        offer_cat_id_clust = "other"
      ) 
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-01-04
      • 1970-01-01
      • 2012-02-02
      • 2019-06-09
      • 2018-06-24
      • 1970-01-01
      相关资源
      最近更新 更多