这是我使用zoo 编写的一些代码- 我没有使用太多xts,所以我不知道是否可以应用相同的功能。希望对您有所帮助!
功能
以下函数计算原始数据的每个区间与给定区间重叠的分数(注意:在以下所有代码中,变量名称ta1和ta2指的是开始和结束给定时间间隔(例如,您需要作为输出的每个相等间隔),而 tb1 和 tb2 指的是原始数据(不相等)间隔的开始和结束):
frac.overlap <- function(ta1,ta2,tb1,tb2){
if(tb1 <= ta1 & tb2 >= ta2) { # Interval 2 starts earlier and ends later than interval 1
frac <- as.numeric(difftime(ta2,ta1,units="secs"))/as.numeric(difftime(tb2,tb1,units="secs"))
} else if(tb1 >= ta1 & tb2 <= ta2) { # Interval 2 is fully contained within interval 1
frac <- 1
} else if(tb1 <= ta1 & tb2 >= ta1) { # Interval 2 partly overlaps with interval 1 (starts earlier, ends earlier)
frac <- as.numeric(difftime(tb2,ta1,units="secs"))/as.numeric(difftime(tb2,tb1,units="secs"))
} else if (tb1 <= ta2 & tb2 >= ta2){ # Interval 2 partly overlaps with interval 1 (starts later, ends later)
frac <- as.numeric(difftime(ta2,tb1,units="secs"))/as.numeric(difftime(tb2,tb1,units="secs"))
} else { # No overlap
frac <- 0
}
return(frac)
}
下一个函数确定原始数据集的哪些记录与当前考虑的区间ta1-ta2重叠:
check.overlap <- function(ta1,ta2,tb1,tb2){
ov <- vector("logical",4)
ov[1] <- (tb1 <= ta1 & tb2 >= ta2) # Interval 2 starts earlier and ends later than interval 1
ov[2] <- (tb1 >= ta1 & tb2 <= ta2) # Interval 2 is fully contained within interval 1
ov[3] <- (tb1 <= ta1 & tb2 >= ta1) # Interval 2 partly overlaps with interval 1 (starts earlier, ends earlier)
ov[4] <- (tb1 <= ta2 & tb2 >= ta2) # Interval 2 partly overlaps with interval 1 (starts later, ends later)
return(as.logical(sum(ov))) # Gives TRUE if at least one element of ov is TRUE, otherwise FALSE
}
(注意:这适用于您提供的示例数据,但在更大的数据集上,我发现它非常慢。由于我编写此代码以使用常规时间步重新采样时间序列,因此我通常使用固定的时间间隔来完成这一步,速度明显更快。根据原始数据的时间间隔修改代码(参见下一个函数的代码)以加快这一步的速度可能很容易。)
下一个函数使用前两个来计算区间ta1-ta2的重采样值:
fracres <- function(tstart,interval,input){
# tstart: POSIX object
# interval: length of interval in seconds
# input: zoo object
ta1 <- tstart
ta2 <- tstart + interval
# First, determine which records of the original data (input) overlap with the current
# interval, to avoid going through the whole object at every iteration
ind <- index(input)
ind1 <- index(lag(input,-1))
recs <- which(sapply(1:length(ind),function(x) check.overlap(ta1,ta2,ind[x],ind1[x])))
#recs <- which(abs(as.numeric(difftime(ind,ta1,units="secs"))) < 601)
# For each record overlapping with the current interval, return the fraction of the input data interval contained in the current interval
if(length(recs) > 0){
fracs <- sapply(1:length(recs), function(x) frac.overlap(ta1,ta2,ind[recs[x]],ind1[recs[x]]))
return(sum(coredata(input)[recs]*fracs))
} else {
return(0)
}
}
(注释掉的行显示如果已知原始时间步长和新时间步长之间的最大时间差,如何获取相关记录。)
应用程序
首先,让我们以zoo 对象的形式读入您的示例数据:
sample_zoo <- read.zoo(text='
2016-07-01 00:00:20, 0.0
2016-07-01 00:01:20, 60.0
2016-07-01 00:01:50, 30.0
2016-07-01 00:02:30, 40.0
2016-07-01 00:04:20, 110.0
2016-07-01 00:05:30, 140.0
2016-07-01 00:06:00, 97.0
2016-07-01 00:07:12, 144.0
2016-07-01 00:08:09, 0.0
', sep=',', index=1, tz='', format="%Y-%m-%d %H:%M:%S")
您的数据集似乎包含瞬时值(“01:20,x 的值是 60”)。由于我为求和值编写了此代码,因此时间戳的含义不同(“从01:20 开始的记录的值为60”)。为了纠正这个问题,需要移动记录:
sample_zoo <- lag(sample_zoo,1)
然后,我们定义一系列POSIXct对象,对应于所需的分辨率:
time.out <- seq.POSIXt(from=as.POSIXct("2016-07-01"),to=(as.POSIXct("2016-07-01")+(60*9)),by="1 min")
然后我们可以应用上面描述的函数fracres:
data.out <- sapply(1:length(time.out), function(x) fracres(tstart=time.out[x],interval=60,input=sample_zoo))
索引和数据组合成一个zoo对象:
zoo.out <- read.zoo(data.frame(time.out,data.out))
最后,时间序列再次移动一步,方向与之前相反:
zoo.out <- lag(zoo.out,-1)
2016-07-01 00:01:00 2016-07-01 00:02:00 2016-07-01 00:03:00 2016-07-01 00:04:00 2016-07-01 00:05:00 2016-07-01 00:06:00 2016-07-01 00:07:00 2016-07-01 00:08:00 2016-07-01 00:09:00
40 60 60 60 100 157 120 24 0