plyr 计算相对聚合答案

【问题标题】：plyr to calculate relative aggregrationplyr 计算相对聚合
【发布时间】：2013-08-22 13:24:03
【问题描述】：

我有一个如下所示的 data.frame：

> head(activity_data)
ev_id cust_id active previous_active start_date
1 1141880     201      1               0 2008-08-17
2 4927803     201      1               0 2013-03-17
3 1141880     244      1               0 2008-08-17
4 2391524     244      1               0 2011-02-05
5 1141868     325      1               0 2008-08-16
6 1141872     325      1               0 2008-08-16

对于每个 cust_id
- 对于每个 ev_id
  - 创建一个新变量 $recent_active（= sum $active 与此 cust_id 的所有行，其中 $start_date > [this_row]$start_date - 10）

我正在努力使用 ddply 执行此操作，因为我的拆分分组是 .(cust_id)，我想返回带有 cust_id 和 ev_id 的行

这是我尝试过的

ddply(activity_data, .(cust_id), function(x) recent_active=sum(x[this_row,]$active))

如果 ddply 不是一个选项，你推荐什么其他有效的方法。我的数据集有大约 2 亿行，我需要每行执行大约 10-15 次。

样本数据为here

【问题讨论】：

我推荐使用data.table，你能给我们一个可重现的例子，以便我们可以根据实际数据写出答案吗？
$start_date > [this_row]$start_date - 10) 中的 10 是多少？ 10天还是10个月还是10年？并请dput 示例数据。
dput 用于子集。结构（列表（EV_ID = C（1144095L，1144095L，23937391,1144083L，1144087L，1144099L，1144081，1190816L，1190818L），Cust_ID = C（201L，201L，244L，244L，325L，325L，325L，325L，325L，325L，325L , 325L), active = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), previous_active = c(0, 0, 0, 0, 0, 0, 0, 0, 0 , 0), start_date = 结构(c(14334, 16007, 14334, 15236, 14333, 14333, 14333, 14333, 14340, 14341), class= "日期")), .Names = c("ev_id", "cust_id ", "活动", "previous_active", "start_date"), row.names = c(NA, 10L), class= "data.frame")

标签： r dataframe plyr

【解决方案1】：

您实际上需要在这里使用两步方法（并且在使用以下代码之前还需要将日期转换为日期格式）

ddply(activity_date, .(cust_id), transform, recent_active=your function) #Not clear what you are asking regarding the function

ddply(activity_date, .(cust_id,ev_id), summarize,recent_active=sum(recent_active))

【讨论】：