【问题标题】:R data.table Subgroup counts and weighted percent of group summaryR data.table 子组计数和组汇总的加权百分比
【发布时间】:2018-04-06 19:16:47
【问题描述】:

我有以下data.table

n = 100000

DT = data.table(customer_ID = 1:n,
                married = rbinom(n, 1, 0.4),
                coupon = rbinom(n, 1, 0.15))

我需要创建一个表格,按婚姻状况子组汇总已婚和未婚客户总数、使用优惠券的客户数量,最后一列按婚姻状况计算每个子组使用优惠券的客户百分比。

输出应该是这样的。

   married Customers using Coupons Total Customers percent_usecoupon
1:       0                    9036           59790          15.11290
2:       1                    5943           40210          14.77991

我当前的代码效率非常低,我确信使用 data.table 有更好的语法,但我似乎找不到它。我在下面复制了我当前的代码:

coupon_marital = DT[coupon == TRUE, .N, by = married][order(-N)] #Count of coupon use by marital status
total_marital = DT[, .N, by = married] #Total count by marital status
setnames(total_marital, "N", "Count") #Rename N to Count
coupon_marital = merge(coupon_marital, total_marital) #Merge data.tables

coupon_marital[, percent_usecoupon := N/Count*100, by = married] #Compute percentage coupon use
setnames(coupon_marital, c("N", "Count"), c("Customers using Coupons", "Total Customers")) #Rename N to Count
rm(total_marital)

print(coupon_marital)

我不能使用 dplyr,只需要使用 data.table。我对 data.table 语法相当陌生,非常感谢任何帮助!

【问题讨论】:

    标签: r data.table


    【解决方案1】:

    创建数据

    set.seed(10)
    n = 100000
    DT = data.table(customer_ID = 1:n,
                    married = rbinom(n, 1, 0.4),
                    coupon = rbinom(n, 1, 0.15))
    

    汇总数据

    DT[, .(N.UseCoupon   = sum(coupon)
          ,N.Total       = .N
          ,Pct.UseCoupon = 100*mean(coupon)), 
       by = married]
    
    #    married N.UseCoupon N.Total Pct.UseCoupon
    # 1:       0        8975   60223      14.90294
    # 2:       1        5904   39777      14.84275
    

    【讨论】:

    • mean 的使用是明智的。不确定使用[,.(N.UseCoupon, N.Total, Pct.UseCoupon = 100*N.UseCoupon/N.Total] 继续管道的其他选项是否有效。
    • 这太完美了!非常感谢!
    • @MKR 再效率,看?GForce我猜后面做第三列应该是最高效的DT[, .(s = sum(x), n = .N), by=g][, p := s/n*100]
    猜你喜欢
    • 2015-09-05
    • 2015-01-23
    • 2019-03-19
    • 1970-01-01
    • 2021-08-29
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-06-15
    相关资源
    最近更新 更多