【发布时间】:2019-05-07 21:43:13
【问题描述】:
说,这是我的数据
mydat=structure(list(ItemRelation = c(11629L, 11629L, 11629L, 11629L,
11629L, 11629L, 11629L, 11629L, 11629L, 11629L, 11629L, 11629L,
11629L, 11629L, 11629L, 11629L, 11629L, 11629L, 11629L, 11629L,
11629L, 11630L, 11630L, 11630L, 11630L, 11630L, 11630L, 11630L,
11630L, 11630L, 11630L, 11630L, 11630L), exp_date_days = c(5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L
), CustomerName = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("ТС", "ТС1"), class = "factor"),
DocumentNum = c(11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 11L
), IsPromo = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), CalendarYear = c(2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L), diff = 1:33), .Names = c("ItemRelation",
"exp_date_days", "CustomerName", "DocumentNum", "IsPromo", "CalendarYear",
"diff"), class = "data.frame", row.names = c(NA, -33L))
Ispromo 只有 0-1-0 的订单!!!
我需要为每个组ItemRelation+CustomerName+DocumentNum+CalendarYear根据条件按sum聚合数据。
如果 group by
exp_date_days的值如果 group by
exp_date_days的值 >5,则 diff 列必须按 sum 聚合,仅在一类 ispromo 之后的 15 个零。如果零小于 15,则按最大零数聚合。
所以在这个例子中输出
ItemRelation CustomerName DocumentNum CalendarYear diff
11629 ТС 11 2018 126
11630 ТС 11 2018 285
如何使用 dplyr 或 data.table 做到这一点?
编辑
ItemRelation exp_date_days CustomerName DocumentNum IsPromo CalendarYear diff
11629 5 ТС 11 0 2018 1
11629 5 ТС 11 0 2018 2
11629 5 ТС 11 0 2018 3
11629 5 ТС 11 0 2018 4
11629 5 ТС 11 0 2018 5
11629 5 ТС 11 0 2018 6
11629 5 ТС 11 0 2018 7
11629 5 ТС 11 0 2018 8
11629 5 ТС 11 0 2018 9
11629 5 ТС 11 0 2018 10
11629 5 ТС 11 0 2018 11
11629 5 ТС 11 0 2018 12
11629 5 ТС 11 1 2018 13
11629 5 ТС 11 1 2018 14
**11629 5 ТС 11 0 2018 15
11629 5 ТС 11 0 2018 16
11629 5 ТС 11 0 2018 17
11629 5 ТС 11 0 2018 18
11629 5 ТС 11 0 2018 19
11629 5 ТС 11 0 2018 20
11629 5 ТС 11 0 2018 21** (sum 126)
编辑2
ItemRelation exp_date_days CustomerName DocumentNum IsPromo CalendarYear diff
11630 6 ТС1 11 0 2018 22
11630 6 ТС1 11 1 2018 23
**11630 6 ТС1 11 0 2018 24
11630 6 ТС1 11 0 2018 25
11630 6 ТС1 11 0 2018 26
11630 6 ТС1 11 0 2018 27
11630 6 ТС1 11 0 2018 28
11630 6 ТС1 11 0 2018 29
11630 6 ТС1 11 0 2018 30
11630 6 ТС1 11 0 2018 31
11630 6 ТС1 11 0 2018 32
11630 6 ТС1 11 0 2018 33** (285)
【问题讨论】:
-
您能否详细说明您的条件。也许显示你对哪些行求和以获得
148和285 -
@JakobGepp,我编辑了帖子,我弄错了/不是 148,但 126 ** 是 diff 中的标记值以汇总总和
标签: r dplyr data.table