使用 XTS 查找早于某个时间戳的最新观察答案

【问题标题】：Finding the most recent observation earlier than a certain timestamp with XTS使用 XTS 查找早于某个时间戳的最新观察
【发布时间】：2014-03-19 16:31:06
【问题描述】：

我有一个看起来像这样的xts 对象：

> q.xts
                                  val
2011-08-31 09:30:00.002357 -1.0135222
2011-08-31 09:30:00.003443 -0.2182679
2011-08-31 09:30:00.005075 -0.5317191
2011-08-31 09:30:00.009515 -1.0639535
2011-08-31 09:30:00.011569 -1.2470759
2011-08-31 09:30:00.012144  0.7678103
2011-08-31 09:30:00.023813 -0.6303432
2011-08-31 09:30:00.024107 -0.5105943

我从另一个数据帧r 中的时间戳计算一个固定偏移量。 r 的行数明显少于q.xts 的行数。

> r
                        time               predict.time
1 2011-08-31 09:30:00.003443 2011-08-31 09:30:00.002443
2 2011-08-31 09:30:00.009515 2011-08-31 09:30:00.008515
3 2011-08-31 09:30:00.024107 2011-08-31 09:30:00.023108

time 列对应于来自q.xts 的观察，而predict.time 列比time 早 1 毫秒（减去任何精度四舍五入）。

我想做的是从q.xts 中找到等于或早于predict.time 的每个值的最后一个观察值。对于上面r 中的三个观察结果，我预计会有以下匹配项：

                        time               predict.time     (time from q.xts)
1 2011-08-31 09:30:00.003443 2011-08-31 09:30:00.002443  --> 09:30:00.002357
2 2011-08-31 09:30:00.009515 2011-08-31 09:30:00.008515  --> 09:30:00.005075
3 2011-08-31 09:30:00.024107 2011-08-31 09:30:00.023108  --> 09:30:00.012144

我通过遍历r 中的每一行并执行xts subset 来解决这个问题。所以，对于r 的第 1 行，我会这样做：

> last(index(q.xts[paste('/', r[1,]$predict.time, sep='')]))
[1] "2011-08-31 09:30:00.002357 CDT"

问题：使用循环执行此操作似乎笨拙且尴尬。有没有更好的办法？我想在r 中得到另一列，它提供q.xts 中相应值的确切时间或行号。

注意：使用它来构建我用于此示例的数据：

q <- read.csv(tc <- textConnection("
       2011-08-31 09:30:00.002358, -1.01352216
       2011-08-31 09:30:00.003443, -0.21826793
       2011-08-31 09:30:00.005076, -0.53171913
       2011-08-31 09:30:00.009515, -1.06395353
       2011-08-31 09:30:00.011570, -1.24707591
       2011-08-31 09:30:00.012144,  0.76781028
       2011-08-31 09:30:00.023814, -0.63034317
       2011-08-31 09:30:00.024108, -0.51059425"),
     header=FALSE); close(tc)
colnames(q) <- c('datetime', 'val')
q.xts <- xts(q[-1], as.POSIXct(q$datetime))

r <- read.csv(tc <- textConnection("
       2011-08-31 09:30:00.003443
       2011-08-31 09:30:00.009515
       2011-08-31 09:30:00.024108"),
     header=FALSE); close(tc)
colnames(r) <- c('time')
r$time <- as.POSIXct(strptime(r$time, '%Y-%m-%d %H:%M:%OS'))
r$predict.time <- r$time - 0.001

【问题讨论】：

一旦你有了它，你将如何使用“r中的列，它为q.xts中的相应值提供确切的时间或行号”？
我有另一个从行构造特征向量的工具链。真实的q.xts 中的列远多于 1 列。因此，对于q.xts 中与r 中的时间戳匹配的每一行，我将构建一组特征。

标签： r vectorization xts

【解决方案1】：

可能有更好的方法来做到这一点，但这是我目前能想到的最好方法。

# create an empty xts object based on r$predict.time
r.xts <- xts(,r$predict.time)
# merge q.xts and r.xts. This will insert NAs at the times in r.xts.
tmp <- merge(q.xts,r.xts)
# Here's the magic:
# lag tmp *backwards* one period, so the NAs appear at the times
# right before the times in r.xts. Then grab the index for the NA periods
tmp.index <- index(tmp[is.na(lag(tmp,-1,na.pad=FALSE))])
# get the rows in q.xts for the times in tmp.index
out <- q.xts[tmp.index]
#                                   val
# 2011-08-31 09:30:00.002357 -1.0135222
# 2011-08-31 09:30:00.005075 -0.5317191
# 2011-08-31 09:30:00.012144  0.7678103

【讨论】：

约书亚，非常聪明。这非常有效。非常感谢。

【解决方案2】：

我会使用findInterval:

findInterval(r$predict.time, index(q.xts))

> q.xts[findInterval(r$predict.time, index(q.xts)),]
                           val
2011-08-31 09:30:00 -1.0135222
2011-08-31 09:30:00 -0.5317191
2011-08-31 09:30:00  0.7678103

您的时间是POSIXct，所以这应该是相当可靠的。

【讨论】：

非常适合我。（我想这比 Joshua 在大型 xts 对象上的答案使用更少的内存/CPU，因为不需要制作合并的 xts 对象？但我没有对其进行基准测试。）