【发布时间】:2018-04-06 19:16:47
【问题描述】:
我有以下data.table
n = 100000
DT = data.table(customer_ID = 1:n,
married = rbinom(n, 1, 0.4),
coupon = rbinom(n, 1, 0.15))
我需要创建一个表格,按婚姻状况子组汇总已婚和未婚客户总数、使用优惠券的客户数量,最后一列按婚姻状况计算每个子组使用优惠券的客户百分比。
输出应该是这样的。
married Customers using Coupons Total Customers percent_usecoupon
1: 0 9036 59790 15.11290
2: 1 5943 40210 14.77991
我当前的代码效率非常低,我确信使用 data.table 有更好的语法,但我似乎找不到它。我在下面复制了我当前的代码:
coupon_marital = DT[coupon == TRUE, .N, by = married][order(-N)] #Count of coupon use by marital status
total_marital = DT[, .N, by = married] #Total count by marital status
setnames(total_marital, "N", "Count") #Rename N to Count
coupon_marital = merge(coupon_marital, total_marital) #Merge data.tables
coupon_marital[, percent_usecoupon := N/Count*100, by = married] #Compute percentage coupon use
setnames(coupon_marital, c("N", "Count"), c("Customers using Coupons", "Total Customers")) #Rename N to Count
rm(total_marital)
print(coupon_marital)
我不能使用 dplyr,只需要使用 data.table。我对 data.table 语法相当陌生,非常感谢任何帮助!
【问题讨论】:
标签: r data.table