【问题标题】:How to group data based of time interval in R如何根据R中的时间间隔对数据进行分组
【发布时间】:2017-11-24 22:33:03
【问题描述】:

我的数据如下所示:

library(plyr)
dates<-data.frame(datecol=as.POSIXct(c(
  "2010-04-03 03:02:38 UTC",
  "2010-04-03 03:03:14 UTC",
  "2010-04-20 03:05:52 UTC",
  "2010-04-20 03:07:42 UTC",
  "2010-04-21 03:09:38 UTC",
  "2010-04-21 03:10:14 UTC",
  "2010-04-21 03:12:52 UTC",
  "2010-04-23 03:13:42 UTC",
  "2010-04-23 03:15:42 UTC",
  "2010-04-23 03:16:38 UTC",
  "2010-04-23 03:18:14 UTC",
  "2010-04-24 03:21:52 UTC",
  "2010-04-24 03:22:42 UTC",
  "2010-04-24 03:24:19 UTC",
  "2010-04-24 03:25:19 UTC"
)), x = cumsum(runif(15)*10),y=cumsum(runif(15)*20))

我想将我的数据分组为 5 天的时间间隔,因此将所有相隔 5 天或更短的点归为一组。我尝试了here的建议:

gr<-ddply(dates,.(cut(datecol,"5 day",include.lowest = TRUE)),"[")

但由于某种原因,我最终得到了 3 个组而不是 2 个组,并且 04/21 和 04/23 的点分为不同的组,即使它们相隔不到 5 天。

这是我想要的:

         group             datecol         x          y
1            1 2010-04-03 03:02:38  8.112423   4.790036
2            1 2010-04-03 03:03:14 11.184709  22.903475
3            2 2010-04-20 03:05:52 17.306835  32.286891
4            2 2010-04-20 03:07:42 24.071488  38.941709
5            2 2010-04-21 03:09:38 26.451493  48.378477
6            2 2010-04-21 03:10:14 33.090645  53.148149
7            2 2010-04-21 03:12:52 38.536416  64.346574
8            2 2010-04-23 03:13:42 40.911074  79.419002
9            2 2010-04-23 03:15:42 41.977579  89.760210
10           2 2010-04-23 03:16:38 46.838773  95.266709
11           2 2010-04-23 03:18:14 48.367159 112.619969
12           2 2010-04-24 03:01:52 57.470412 113.594423
13           2 2010-04-24 03:02:42 63.202005 123.653370
14           2 2010-04-24 03:04:19 65.615348 137.184153
15           2 2010-04-24 03:25:19 75.177633 137.559003

【问题讨论】:

    标签: r time time-series grouping plyr


    【解决方案1】:

    您可以手动设置休息时间,以便将它们引用到您希望的任何基线日期。例如:

    library(lubridate)
    
    start.date = ymd_hms("2010-04-20 00:00:00")
    breaks = seq(start.date - 30*3600*24, start.date + 30*3600*24, "5 days")
    
    dates$group5 = cut(dates$datecol, breaks=breaks)
    
                   datecol         x         y     group5
    1  2010-04-03 03:02:38  7.265758  10.80777 2010-03-31
    2  2010-04-03 03:03:14 15.632081  13.57187 2010-03-31
    3  2010-04-20 03:05:52 19.219491  19.76293 2010-04-20
    4  2010-04-20 03:07:42 20.605199  37.22687 2010-04-20
    5  2010-04-21 03:09:38 26.533445  53.90345 2010-04-20
    6  2010-04-21 03:10:14 33.449645  56.27885 2010-04-20
    7  2010-04-21 03:12:52 39.050517  71.74788 2010-04-20
    8  2010-04-23 03:13:42 39.499227  76.92669 2010-04-20
    9  2010-04-23 03:15:42 44.827766  79.49207 2010-04-20
    10 2010-04-23 03:16:38 54.206473  89.60895 2010-04-20
    11 2010-04-23 03:18:14 54.982695  94.37855 2010-04-20
    12 2010-04-24 03:21:52 64.414931 104.24174 2010-04-20
    13 2010-04-24 03:22:42 64.659980 113.77616 2010-04-20
    14 2010-04-24 03:24:19 67.343105 128.06813 2010-04-20
    15 2010-04-24 03:25:19 71.060741 138.43512 2010-04-20
    

    【讨论】:

      【解决方案2】:

      检查滞后值并在必要时更新的cumsum 怎么样?我们使用 data.table 库中的 shift() 函数来解决滞后问题。

      library(data.table)
      dates$group <- cumsum(ifelse(difftime(dates$datecol,
                                        shift(dates$datecol, fill = dates$datecol[1]), 
                                        units = "days") >= 5 
                               ,1, 0)) + 1
      
      head(dates)
      #              datecol         x         y group
      #1 2010-04-03 03:02:38  4.776196  5.160336     1
      #2 2010-04-03 03:03:14 13.388291 14.731241     1
      #3 2010-04-20 03:05:52 17.769262 30.057454     2
      #4 2010-04-20 03:07:42 20.217235 31.742392     2
      #5 2010-04-21 03:09:38 20.924025 49.248819     2
      #6 2010-04-21 03:10:14 21.918687 56.030278     2
      

      这假设您的数据按时间从小到大排序

      【讨论】:

      • 如果每个时间戳相隔 1 天怎么办?使用上述方法,无论第一个时间戳和最后一个时间戳相距多远,它们都将最终在同一个组中。不确定这是否是@Liza 想要的,但我认为这是一个不适定的问题。
      猜你喜欢
      • 2022-01-17
      • 1970-01-01
      • 2015-03-07
      • 2021-07-07
      • 2015-04-13
      • 2012-07-07
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多