【发布时间】:2017-10-13 04:40:05
【问题描述】:
我有一张这样的桌子:
由以下代码生成:
id <- c("1","2","1","2","1","1")
status <- c("open","open","closed","closed","open","closed")
date <- c("11-10-2017 15:10","10-10-2017 12:10","12-10-2017 22:10","13-10-2017 06:30","13-10-2017 09:30","13-10-2017 10:30")
data <- data.frame(id,status,date)
hour <- data.frame(do.call('rbind', strsplit(as.character(data$date),' ',fixed=TRUE)))
hour <- hour[,2]
hour <- as.POSIXlt(hour, format = "%H:%M")
而我想要实现的是为每个id选择最早开放时间和最晚关闭时间。所以最终的结果会是这样的:
目前我使用sqldf来解决问题:
sqldf("select * from (select id, status, date as closeDate, max(hour) as hour from data
where status='closed'
group by id,status) as a
join
(select id, status, date as openDate, min(hour) as hour from data
where status='open'
group by id,status) as b
using(id);")
问题1:有更简单的方法吗?
问题2:如果我选择max(hour)作为其他名称而不是hour,结果将不是日期和时间的格式,而是像1507864200这样的一串数字,1507807800。如何在为列分配不同名称的同时保持时间格式?
【问题讨论】:
-
您的意思是让
hour成为您数据中的一列吗?也许您忘记了data$hour <- hour行?