【问题标题】:Split date into YYYY-MM-DD-HH-MM-SS and aggregate date (R)将日期拆分为 YYYY-MM-DD-HH-MM-SS 和汇总日期 (R)
【发布时间】:2016-07-24 18:06:25
【问题描述】:

如何将以下datetime 拆分为year-month-day-hour-minute-second?创建日期使用:

datetime = seq.POSIXt(as.POSIXct("2015-04-01 0:00:00", tz = 'GMT'), 
                                           as.POSIXct("2015-11-30 23:59:59", tz = 'GMT'), 
                                           by="hour",tz="GMT"))

最终目标是将x(分辨率为hourly)聚合成6-hourly分辨率。可能aggregate datetime不需要拆分是可能的吗?

             datetime     x 
1  2015-04-01 00:00:00    0.0     
2  2015-04-01 01:00:00    0.0     
3  2015-04-01 02:00:00    0.0     
4  2015-04-01 03:00:00    0.0     
5  2015-04-01 04:00:00    0.0     
6  2015-04-01 05:00:00    0.0     
7  2015-04-01 06:00:00    0.0     
8  2015-04-01 07:00:00    0.0     
9  2015-04-01 08:00:00    0.0     
10 2015-04-01 09:00:00    0.0     
11 2015-04-01 10:00:00    0.0     
12 2015-04-01 11:00:00    0.0     
13 2015-04-01 12:00:00    0.0     
14 2015-04-01 13:00:00    0.0     
15 2015-04-01 14:00:00    0.0     
16 2015-04-01 15:00:00    0.0     
17 2015-04-01 16:00:00    0.0     
18 2015-04-01 17:00:00    0.0     
19 2015-04-01 18:00:00    0.0     
20 2015-04-01 19:00:00    0.0     
21 2015-04-01 20:00:00    0.0     
22 2015-04-01 21:00:00    0.0     
23 2015-04-01 22:00:00    1.6     
24 2015-04-01 23:00:00    0.2     
25 2015-04-02 00:00:00    1.5     
26 2015-04-02 01:00:00    1.5     
27 2015-04-02 02:00:00    0.5     
28 2015-04-02 03:00:00    0.0     
29 2015-04-02 04:00:00    0.0     
30 2015-04-02 05:00:00    0.0     
31 2015-04-02 06:00:00    0.0     
32 2015-04-02 07:00:00    0.5     
33 2015-04-02 08:00:00    0.3     
34 2015-04-02 09:00:00    0.0     
35 2015-04-02 10:00:00    0.0     
36 2015-04-02 11:00:00    0.0     
37 2015-04-02 12:00:00    0.0     
38 2015-04-02 13:00:00    0.0     
39 2015-04-02 14:00:00    0.0     
40 2015-04-02 15:00:00    0.0     
41 2015-04-02 16:00:00    0.0     
42 2015-04-02 17:00:00    0.0     
43 2015-04-02 18:00:00    0.0     
44 2015-04-02 19:00:00    0.0     
45 2015-04-02 20:00:00    0.0     
46 2015-04-02 21:00:00    0.0     
47 2015-04-02 22:00:00    0.0     
48 2015-04-02 23:00:00    0.0 
....

输出应该非常接近:

YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
2015-04-01 00:00:00  2015-04-01 06:00:00  2015-04-01 12:00:00  2015-04-01 18:00:00
2015-04-02 00:00:00  2015-04-02 06:00:00  2015-04-02 12:00:00  2015-04-02 18:00:00 
.....

感谢您对此的看法。

编辑

如何在列表对象上实现@r2evans answer,例如:

 x = runif(5856)
    flst1=list(x,x,x,x)

    flst1=lapply(flst1, function(x){x$datetime <- as.POSIXct(x$datetime, tz = "GMT"); x})

    sixhours1=lapply(flst1, function(x) {x$bin <- cut(x$datetime,sixhours);x})

    head(sixhours1[[1]],n=7)

    ret=lapply(sixhours1, function(x) aggregate(x$precip, list(x$bin), sum,na.rm=T))

    head(ret[[1]],n=20)

【问题讨论】:

  • 研究 zoo 包的小插图。都在那里。
  • 也许您可以使用另一个seq(与by="6 hours")和cut
  • @r2evans 如何使用上面提供的数据实现您的想法?谢谢。

标签: r date datetime split aggregate


【解决方案1】:

我得到了一个解决方案:

library(xts)
flst<- list.files(pattern=".csv")
flst1<- lapply(flst,function(x) read.csv(x,header = TRUE,stringsAsFactors=FALSE,sep = ",",fill=TRUE, 
                                        dec = ".",quote = "\"",colClasses=c('factor', 'numeric', 'NULL'))) # read files ignoring 3 column
head(flst1[[1]])
dat.xts=lapply(flst1, function(x) xts(x$precip,as.POSIXct(x$datetime)))
head(dat.xts[[1]])
ep.xts=lapply(dat.xts, function(x) endpoints(x, on="hours", k=6))#k=by .... see endpoints for "on"
head(ep.xts[[1]])
stations6hrly<-lapply(dat.xts, function(x) period.apply(x, FUN=sum,INDEX=ep))

head(stations6hrly[[703]])
                    [,1]
2015-04-01 05:00:00  0.3
2015-04-01 11:00:00  1.2
2015-04-01 17:00:00  0.0
2015-04-01 23:00:00  0.2
2015-04-02 05:00:00  0.0
2015-04-02 11:00:00  1.4

日期不是我想要的,但值是正确的。我怀疑R中是否有-shifttime函数,就像CDO中一样

【讨论】:

    【解决方案2】:

    您的最小数据不完整,所以我会随机生成一些数据:

    dat <- data.frame(datetime = seq.POSIXt(as.POSIXct("2015-04-01 0:00:00", tz = "GMT"),
                                            as.POSIXct("2015-11-30 23:59:59", tz = "GMT"), 
                                            by = "hour",tz = "GMT"),
                      x = runif(5856))
    # the "1+" ensures we extend at least to the end of the datetimes;
    # without it, the last several rows in "bin" would be NA
    sixhours <- seq.POSIXt(as.POSIXct("2015-04-01 0:00:00", tz = "GMT"),
                           1 + as.POSIXct("2015-11-30 23:59:59", tz = "GMT"), 
                           by = "6 hours",tz = "GMT")
    
    # this doesn't have to go into the data.frame (could be a separate
    # vector), but I'm including it for easy row-wise comparison
    dat$bin <- cut(dat$datetime, sixhours)
    
    head(dat, n=7)
    #              datetime          x                 bin
    # 1 2015-04-01 00:00:00 0.91022534 2015-04-01 00:00:00
    # 2 2015-04-01 01:00:00 0.02638850 2015-04-01 00:00:00
    # 3 2015-04-01 02:00:00 0.42486354 2015-04-01 00:00:00
    # 4 2015-04-01 03:00:00 0.90722845 2015-04-01 00:00:00
    # 5 2015-04-01 04:00:00 0.24540085 2015-04-01 00:00:00
    # 6 2015-04-01 05:00:00 0.60360906 2015-04-01 00:00:00
    # 7 2015-04-01 06:00:00 0.01843313 2015-04-01 06:00:00
    tail(dat)
    #                 datetime         x                 bin
    # 5851 2015-11-30 18:00:00 0.5963204 2015-11-30 18:00:00
    # 5852 2015-11-30 19:00:00 0.2503440 2015-11-30 18:00:00
    # 5853 2015-11-30 20:00:00 0.9600476 2015-11-30 18:00:00
    # 5854 2015-11-30 21:00:00 0.6837394 2015-11-30 18:00:00
    # 5855 2015-11-30 22:00:00 0.9093506 2015-11-30 18:00:00
    # 5856 2015-11-30 23:00:00 0.9197769 2015-11-30 18:00:00
    nrow(dat)
    # [1] 5856
    

    作品:

    ret <- aggregate(dat$x, list(dat$bin), mean)
    nrow(ret)
    # [1] 976
    head(ret)
    #               Group.1         x
    # 1 2015-04-01 00:00:00 0.5196193
    # 2 2015-04-01 06:00:00 0.4770019
    # 3 2015-04-01 12:00:00 0.5359483
    # 4 2015-04-01 18:00:00 0.8140603
    # 5 2015-04-02 00:00:00 0.4874332
    # 6 2015-04-02 06:00:00 0.6139554
    tail(ret)
    #                 Group.1         x
    # 971 2015-11-29 12:00:00 0.6881228
    # 972 2015-11-29 18:00:00 0.4791925
    # 973 2015-11-30 00:00:00 0.5793872
    # 974 2015-11-30 06:00:00 0.4809868
    # 975 2015-11-30 12:00:00 0.5157432
    # 976 2015-11-30 18:00:00 0.7199298
    

    【讨论】:

    • 哦哇! @r2evans 让我将您的解决方案与下面提供的我的解决方案进行比较。您的日期按预期安排得很好
    • 为什么是pp=aggregate(dat10$x, list(dat10$bin), sum); > nrow(pp) = 975 instead of 976? 5856/6 =976。在我下面的例子中:nrow(stations6hrly[[703]]) [1] 976。与之进行比较的所有其他数据集也有 976 行。
    • 完美!正是我想要的。与下面我的解决方案中的值不同,这些值与时间戳对齐。我现在将学习如何编写用于list 对象的代码
    • 你能把这些行写成列表格式吗? dat$bin &lt;- cut(dat$datetime, sixhours)ret &lt;- aggregate(dat$x, list(dat$bin), mean)sixhours=lapply(sixhours, function(x) {x$bin &lt;- cut(x$datetime, x); x}) 给出错误:Error in cut(x$datetime, x) : error in evaluating the argument 'x' in selecting a method for function 'cut': Error in x$datetime : $ operator is invalid for atomic vectors
    • 我不明白你在做什么。如果它是直截了当的,请编辑您的问题,不要在评论中发布丢失的代码和错误。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2023-03-04
    • 1970-01-01
    • 2019-02-17
    • 2013-12-10
    • 2020-05-10
    • 2017-07-20
    • 1970-01-01
    相关资源
    最近更新 更多