【发布时间】:2021-07-06 07:23:59
【问题描述】:
我有以下假数据集。在每天 (dates) 的一段时间内,所有元素 (id) 的状态 (status) 都会被记录。
df <- data.frame( id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4),
dates = c("2021-01-01",
"2021-01-02",
"2021-01-03",
"2021-01-04",
"2021-01-05",
"2021-01-01",
"2021-01-02",
"2021-01-03",
"2021-01-04",
"2021-01-05",
"2021-01-01",
"2021-01-02",
"2021-01-03",
"2021-01-04",
"2021-01-05",
"2021-01-01",
"2021-01-02",
"2021-01-03",
"2021-01-04",
"2021-01-05"),
status = c("A", "A", "A", "B", "C",
"A", "A", "B", "C", "C",
"A", "B", "C", "D", "E",
"A", "B", "B", "B", "B")
)
> df
id dates status
1 1 2021-01-01 A
2 1 2021-01-02 A
3 1 2021-01-03 A
4 1 2021-01-04 B
5 1 2021-01-05 C
6 2 2021-01-01 A
7 2 2021-01-02 A
8 2 2021-01-03 B
9 2 2021-01-04 C
10 2 2021-01-05 C
11 3 2021-01-01 A
12 3 2021-01-02 B
13 3 2021-01-03 C
14 3 2021-01-04 D
15 3 2021-01-05 E
16 4 2021-01-01 A
17 4 2021-01-02 B
18 4 2021-01-03 B
19 4 2021-01-04 B
20 4 2021-01-05 B
不幸的是,为了节省空间,减少了数据框:如果在随后的两天内状态相同,则删除了第二个条目。假设状态保持不变,直到它再次改变,所以实际的数据集如下所示:
> df %>% group_by(id) %>%
+ mutate(dupl = duplicated(status, 2)) %>%
+ ungroup() %>%
+ filter(dupl == FALSE) %>%
+ select(-dupl)
# A tibble: 13 x 3
id dates status
<dbl> <chr> <chr>
1 1 2021-01-01 A
2 1 2021-01-04 B
3 1 2021-01-05 C
4 2 2021-01-01 A
5 2 2021-01-03 B
6 2 2021-01-04 C
7 3 2021-01-01 A
8 3 2021-01-02 B
9 3 2021-01-03 C
10 3 2021-01-04 D
11 3 2021-01-05 E
12 4 2021-01-01 A
13 4 2021-01-02 B
我现在的问题是:我怎样才能再次回到数据集的第一个(完整)版本?所有ids(2021-01-01 到 2021-01-05)的时间段始终相同
【问题讨论】: