【发布时间】:2020-04-01 00:25:39
【问题描述】:
目标:
我有一个数据集 df,我想按 ID 分组并根据特定条件查找持续时间:Focus == True、Read == True 和 ID != ""。但是,我不想聚合 ID,因为我希望将它们放在自己单独的“块”中在输出下方。
ID Date Focus Read
A 1/2/2020 5:00:00 AM TRUE TRUE
A 1/2/2020 5:00:05 AM TRUE TRUE
1/3/2020 6:00:00 AM TRUE
1/3/2020 6:00:05 AM TRUE
B 1/4/2020 7:00:00 AM TRUE TRUE
B 1/4/2020 7:00:05 AM TRUE TRUE
B 1/4/2020 7:20:00 AM TRUE TRUE
B 1/4/2020 7:20:10 AM TRUE TRUE
A 1/2/2020 7:30:00 AM TRUE TRUE
A 1/2/2020 7:30:20 AM TRUE TRUE
我想要这个输出:
ID Duration Start End
A 5 sec 1/2/2020 5:00:00 AM 1/2/2020 5:00:05 AM
B 5 sec 1/4/2020 7:00:00 AM 1/4/2020 7:00:05 AM
B 10 sec 1/4/2020 7:20:00 AM 1/4/2020 7:20:10 AM
A 20 sec 1/2/2020 7:30:00 AM 1/2/2020 7:30:20 AM
输出:
structure(list(ID = structure(c(2L, 2L, 1L, 1L, 3L, 3L, 3L, 3L,
2L, 2L), .Label = c("", "A", "B"), class = "factor"), Date = structure(c(1L,
2L, 5L, 6L, 7L, 8L, 9L, 10L, 3L, 4L), .Label = c("1/2/2020 5:00:00 AM",
"1/2/2020 5:00:05 AM", "1/2/2020 7:30:00 AM", "1/2/2020 7:30:20 AM",
"1/3/2020 6:00:00 AM", "1/3/2020 6:00:05 AM", "1/4/2020 7:00:00 AM",
"1/4/2020 7:00:05 AM", "1/4/2020 7:20:00 AM", "1/4/2020 7:20:10 AM"
), class = "factor"), Focus = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "True ", class = "factor"), Read = structure(c(2L,
2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "True "), class = "factor")), class = "data.frame", row.names = c(NA,
-10L))
这很好用,但不是聚合 ID,而是如何将它们分开:
library(dplyr)
library(lubridate)
df %>%
filter(as.logical(trimws(Read)), as.logical(trimws(Focus))) %>%
mutate(Date = mdy_hms(Date)) %>%
group_by(ID) %>%
summarise(Duration = difftime(last(Date), first(Date), units = "secs"))
欢迎提出任何建议。
【问题讨论】:
-
所以,忽略空白
IDs?您是否有理由使用"True"等字符串而不是logical变量与TRUE和FALSER natives? -
是的,忽略空白 ID。我可以使用TRUE,FALSE。我会编辑这个。