【发布时间】:2019-06-05 03:19:47
【问题描述】:
如何按组查找连续周数,但从数据集中的最大日期开始计算?
假设我有这个数据框:
id Week
1 A 2/06/2019
2 A 26/05/2019
3 A 19/05/2019
4 A 12/05/2019
5 A 5/05/2019
6 B 2/06/2019
7 B 26/05/2019
8 B 12/05/2019
9 B 5/05/2019
10 C 26/05/2019
11 C 19/05/2019
12 C 12/05/2019
13 D 2/06/2019
14 D 26/05/2019
15 D 19/05/2019
16 E 2/06/2019
17 E 19/05/2019
18 E 12/05/2019
19 E 5/05/2019
我想要的输出是:
id count
1: A 5
2: B 2
3: D 3
4: E 1
我目前正在将日期转换为因子以获取订购编号并检查根据每组中的行数创建的参考编号。
library(data.table)
df <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L, 5L),
.Label = c("A", "B", "C", "D", "E"), class = "factor"),
Week = structure(c(3L, 4L, 2L, 1L, 5L, 3L, 4L, 1L, 5L, 4L, 2L, 1L, 3L, 4L, 2L, 3L, 2L, 1L, 5L),
.Label = c("12/05/2019", "19/05/2019", "2/06/2019", "26/05/2019", "5/05/2019"), class = "factor")),
class = "data.frame", row.names = c(NA, -19L))
dt <- data.table(df)
dt[, Week_no := as.factor(as.Date(Week, format = "%d/%m/%Y"))]
dt[, Week_no := factor(Week_no)]
dt[, Week_no := as.numeric(Week_no)]
max_no <- max(dt$Week_no)
dt[, Week_ref := max_no:(max_no - .N + 1), by = "id"]
dt[, Week_diff := Week_no - Week_ref]
dt[Week_diff == 0, list(count = .N), by = "id"]
【问题讨论】:
-
lubridate::week可能有用
标签: r data.table