【问题标题】:Organize data with maximum and minimum values in r用 r 中的最大值和最小值组织数据
【发布时间】:2017-10-13 04:40:05
【问题描述】:

我有一张这样的桌子:

由以下代码生成:

id <- c("1","2","1","2","1","1")
status <- c("open","open","closed","closed","open","closed")
date <- c("11-10-2017 15:10","10-10-2017 12:10","12-10-2017 22:10","13-10-2017 06:30","13-10-2017 09:30","13-10-2017 10:30")
data <- data.frame(id,status,date)
hour <- data.frame(do.call('rbind', strsplit(as.character(data$date),' ',fixed=TRUE)))
hour <- hour[,2]
hour <- as.POSIXlt(hour, format = "%H:%M") 

而我想要实现的是为每个id选择最早开放时间最晚关闭时间。所以最终的结果会是这样的:

目前我使用sqldf来解决问题:

sqldf("select * from (select id, status, date as closeDate, max(hour) as hour from data 
  where status='closed'
   group by id,status) as a
   join 
   (select id, status, date  as openDate, min(hour) as hour from data 
   where status='open'
   group by id,status) as b
  using(id);")

问题1:有更简单的方法吗?

问题2:如果我选择max(hour)作为其他名称而不是hour,结果将不是日期和时间的格式,而是像1507864200这样的一串数字,1507807800。如何在为列分配不同名称的同时保持时间格式?

【问题讨论】:

  • 您的意思是让hour 成为您数据中的一列吗?也许您忘记了data$hour &lt;- hour 行?

标签: r datetime group-by sqldf


【解决方案1】:

使用包plyr:

(由于某种原因,如图here,您必须将小时转换为班级as.POSIXct,否则会收到错误消息):

#add hour to data.frame:
data$hour <- as.POSIXct(hour)
library(plyr)
ddply(data, .(id), summarize, open=min(hour[status=="open"]),
     closed=max(hour[status=="closed"]))

【讨论】:

    猜你喜欢
    • 2013-06-05
    • 1970-01-01
    • 2013-09-02
    • 1970-01-01
    • 1970-01-01
    • 2022-12-02
    • 1970-01-01
    • 2017-02-13
    • 2012-02-11
    相关资源
    最近更新 更多