使用 dplyr 的组之间的累积求和答案

【问题标题】：Cumulative summing between groups using dplyr使用 dplyr 的组之间的累积求和
【发布时间】：2018-04-05 21:25:22
【问题描述】：

我有一个结构如下的小标题：

   day  theta
1   1    2.1
2   1    2.1
3   2    3.2
4   2    3.2
5   5    9.5
6   5    9.5
7   5    9.5

请注意，对于每个day，tibble 包含多行，并且对于每个day，theta 的相同值重复任意次数。（小标题包含需要这种重复结构的其他任意列。）

我想使用dplyr 对theta 在几天内的值进行累计求和，这样，在上面的示例中，2.1 只添加一次到3.2 等。小标题将是变异以追加新的累积和（c.theta），如下所示：

   day  theta  c.theta
1   1    2.1     2.1
2   1    2.1     2.1
3   2    3.2     5.3
4   2    3.2     5.3
5   5    9.5     14.8
6   5    9.5     14.8
7   5    9.5     14.8 
...

我最初对group_byday 和cumsum 超过theta 的尝试仅导致对整个数据集（例如2.1 + 2.1 + 3.2 ...）进行累积求和，这是不可取的。在我的 Stack Overflow 搜索中，我可以在组内找到许多 examples 的累积求和，但从来没有在组之间找到，如上所述。非常感谢您朝正确方向轻推。

【问题讨论】：

标签： r dplyr

【解决方案1】：

在基础 R 中，您可以使用 split<- 和 tapply 来返回所需的结果。

# construct 0 vector to fill in
dat$temp <- 0
# fill in with cumulative sum for each day
split(dat$temp, dat$day) <- cumsum(tapply(dat$theta, dat$day, head, 1))

在这里，tapply 返回每天馈送到cumsum 的 theta 的第一个元素。使用split<- 将累积和的元素馈送到每一天。

dat
  day theta temp
1   1   2.1  2.1
2   1   2.1  2.1
3   2   3.2  5.3
4   2   3.2  5.3
5   5   9.5 14.8
6   5   9.5 14.8
7   5   9.5 14.8

【讨论】：

【解决方案2】：

不是dplyr，而是替代data.table 解决方案：

library(data.table)
# Original table is called d
setDT(d)
merge(d, unique(d)[, .(c.theta = cumsum(theta), day)])

   day theta c.theta
1:   1   2.1     2.1
2:   1   2.1     2.1
3:   2   3.2     5.3
4:   2   3.2     5.3
5:   5   9.5    14.8
6:   5   9.5    14.8
7:   5   9.5    14.8

PS：如果你想保留其他列，你必须使用unique(d[, .(day, theta)])

【讨论】：

【解决方案3】：

在dplyr 中执行此操作我想出了一个与 PoGibas 非常相似的解决方案 - 使用 distinct 每天只获取一行，找到总和并重新合并：

df = read.table(text="day  theta
1   1    2.1
2   1    2.1
3   2    3.2
4   2    3.2
5   5    9.5
6   5    9.5
7   5    9.5", header = TRUE)

cumsums = df %>%
    distinct(day, theta) %>%
    mutate(ctheta = cumsum(theta))

df %>%
    left_join(cumsums %>% select(day, ctheta), by = 'day')

【讨论】：

太棒了！ distinct 函数会派上用场。