【发布时间】:2019-10-30 20:05:03
【问题描述】:
我有一个数据框作为变更日志。我想累计当前日期和之前所有日期的值,过滤掉旧的重复ID。
与这个问题非常相似: cumsum() up to and including current date in dplyr
这是当前代码:
library(dplyr)
tribble(~ID,~Date, ~value,
"ID-1", "2019-01-01", 50,
"ID-2", "2019-01-02", 20,
"ID-3", "2019-01-03", 35,
"ID-1", "2019-01-04", 0,
"ID-4", "2019-01-04", 20,
"ID-5", "2019-01-04", 25,
"ID-6", "2019-01-07", 100,
"ID-3", "2019-01-08", 0,
"ID-7", "2019-01-08", 15,
"ID-8", "2019-01-08", 10,
"ID-6", "2019-01-10", 0,
"ID-9", "2019-01-10", 45,
"ID-10", "2019-01-10", 40) %>% arrange(Date) %>% mutate(run_sum=cumsum(value)) %>% group_by(Date) %>% mutate(run_sum = last(run_sum))
输出:
# A tibble: 13 x 4
# Groups: Date [7]
ID Date value run_sum
<chr> <chr> <dbl> <dbl>
1 ID-1 2019-01-01 50 50
2 ID-2 2019-01-02 20 70
3 ID-3 2019-01-03 35 105
4 ID-1 2019-01-04 0 150
5 ID-4 2019-01-04 20 150
6 ID-5 2019-01-04 25 150
7 ID-6 2019-01-07 100 250
8 ID-3 2019-01-08 0 275
9 ID-7 2019-01-08 15 275
10 ID-8 2019-01-08 10 275
11 ID-6 2019-01-10 0 360
12 ID-9 2019-01-10 45 360
13 ID-10 2019-01-10 40 360
有什么好方法可以让 run_sum 列看起来像这样?
# A tibble: 13 x 4
ID Date value run_sum
<chr> <chr> <dbl> <dbl>
1 ID-1 2019-01-01 50 50
2 ID-2 2019-01-02 20 70
3 ID-3 2019-01-03 35 105
4 ID-1 2019-01-04 0 100
5 ID-4 2019-01-04 20 100
6 ID-5 2019-01-04 25 100
7 ID-6 2019-01-07 100 200
8 ID-3 2019-01-08 0 190
9 ID-7 2019-01-08 15 190
10 ID-8 2019-01-08 10 190
11 ID-6 2019-01-10 0 175
12 ID-9 2019-01-10 45 175
13 ID-10 2019-01-10 40 175
当新的 ID 更新发生时,run_sum 在哪里过滤掉旧的 ID 重复值?
【问题讨论】:
-
您能否提供具体的计算方法?比方说,第 5 行的
run_sum是如何计算的?