【发布时间】:2014-08-17 18:00:28
【问题描述】:
我认为这是一个split-apply-combine 问题,但有时间序列扭曲。我的数据由不规则计数组成,我需要对每组计数执行一些汇总统计。这是数据的快照:
这是为您的控制台准备的:
library(xts)
date <- as.Date(c("2010-11-18", "2010-11-19", "2010-11-26", "2010-12-03", "2010-12-10",
"2010-12-17", "2010-12-24", "2010-12-31", "2011-01-07", "2011-01-14",
"2011-01-21", "2011-01-28", "2011-02-04", "2011-02-11", "2011-02-18",
"2011-02-25", "2011-03-04", "2011-03-11", "2011-03-18", "2011-03-25",
"2011-03-26", "2011-03-27"))
returns <- c(0.002,0.000,-0.009,0.030, 0.013,0.003,0.010,0.001,0.011,0.017,
-0.008,-0.005,0.027,0.014,0.010,-0.017,0.001,-0.013,0.027,-0.019,
0.000,0.001)
count <- c(NA,NA,1,1,2,2,3,4,5,6,7,7,7,7,7,NA,NA,NA,1,2,NA,NA)
maxCount <- c(NA,NA,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,
0.030,0.030,0.030,0.030,NA,NA,NA,0.027,0.027,NA,NA)
sumCount <- c(NA,NA,0.000,0.030,0.042,0.045,0.056,0.056,0.067,0.084,0.077,
0.071,0.098,0.112,0.123,NA,NA,NA,0.000,-0.019,NA,NA)
xtsData <- xts(cbind(returns,count,maxCount,sumCount),date)
我不知道如何构造 max 和 cumSum 列,特别是因为每个计数系列的长度不规则。由于我并不总是知道计数系列的起点和终点,因此我在试图找出这些组的索引时迷失了方向。感谢您的帮助!
更新:这是我尝试计算 cumSum 的 for 循环。这不是累积和,只是必要的回报,我仍然不确定如何将函数应用于这些范围!
xtsData <- cbind(xtsData,mySumCount=NA)
# find groups of returns
for(i in 1:nrow(xtsData)){
if(is.na(xtsData[i,"count"]) == FALSE){
xtsData[i,"mySumCount"] <- xtsData[i,"returns"]
}
else{
xtsData[i,"mySumCount"] <- NA
}
}
更新 2:感谢评论者!
# report returns when not NA count
x1 <- xtsData[!is.na(xtsData$count),"returns"]
# cum sum is close, but still need to exclude the first element
# -0.009 in the first series of counts and .027 in the second series of counts
x2 <- cumsum(xtsData[!is.na(xtsData$count),"returns"])
# this is output is not accurate because .03 is being displayed down the entire column, not just during periods when counts != NA. is this just a rounding error?
x3 <- max(xtsData[!is.na(xtsData$count),"returns"])
解决方案:
# function to pad a vector with a 0
lagpad <- function(x, k) {
c(rep(0, k), x)[1 : length(x)]
}
# group the counts
x1 <- na.omit(transform(xtsData, g = cumsum(c(0, diff(!is.na(count)) == 1))))
# cumulative sum of the count series
z1 <- transform(x1, cumsumRet = ave(returns, g, FUN =function(x) cumsum(replace(x, 1, 0))))
# max of the count series
z2 <- transform(x1, maxRet = ave(returns, g, FUN =function(x) max(lagpad(x,1))))
merge(xtsData,z1$cumsumRet,z2$maxRet)
【问题讨论】:
-
谢谢@DavidArenburg,但你看到的都是硬编码的。手动输入。我需要帮助计算 maxCount 和 sumCount
-
输入是什么?输出?如何从输入计算输出?什么定义了“计数系列”?另外,请展示您尝试过的内容。
-
@G.Grothendieck 好的,请给我一分钟时间输入我用来计算 sumCount 的 for 循环。 maxCount 我不知道。为了明确输入和输出,输入是计数数据。计数系列是计数范围,以 NA 为界。所以在上面的例子中,有 2 个计数系列,一个是 1:7,另一个是 1:2。计算的输出基于“返回”列,但仅在计数时间序列不为 NA 的期间有条件。
-
xtsData[!is.na(xtsData$count),"mySumCount2"]
-
计算最大值:xtsData[!is.na(xtsData$count),"myMax2"]