【问题标题】:R: aggregating time series groups of irregular lengthR:聚合不规则长度的时间序列组
【发布时间】:2014-08-17 18:00:28
【问题描述】:

我认为这是一个split-apply-combine 问题,但有时间序列扭曲。我的数据由不规则计数组成,我需要对每组计数执行一些汇总统计。这是数据的快照:

这是为您的控制台准备的:

library(xts)

date <- as.Date(c("2010-11-18", "2010-11-19", "2010-11-26", "2010-12-03", "2010-12-10",
              "2010-12-17", "2010-12-24", "2010-12-31", "2011-01-07", "2011-01-14",
              "2011-01-21", "2011-01-28", "2011-02-04", "2011-02-11", "2011-02-18",
              "2011-02-25", "2011-03-04", "2011-03-11", "2011-03-18", "2011-03-25",
              "2011-03-26", "2011-03-27"))

returns <- c(0.002,0.000,-0.009,0.030, 0.013,0.003,0.010,0.001,0.011,0.017,
         -0.008,-0.005,0.027,0.014,0.010,-0.017,0.001,-0.013,0.027,-0.019,
         0.000,0.001)
count <- c(NA,NA,1,1,2,2,3,4,5,6,7,7,7,7,7,NA,NA,NA,1,2,NA,NA)
maxCount <- c(NA,NA,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,
          0.030,0.030,0.030,0.030,NA,NA,NA,0.027,0.027,NA,NA)
sumCount <- c(NA,NA,0.000,0.030,0.042,0.045,0.056,0.056,0.067,0.084,0.077,
          0.071,0.098,0.112,0.123,NA,NA,NA,0.000,-0.019,NA,NA)

xtsData <- xts(cbind(returns,count,maxCount,sumCount),date)

我不知道如何构造 max 和 cumSum 列,特别是因为每个计数系列的长度不规则。由于我并不总是知道计数系列的起点和终点,因此我在试图找出这些组的索引时迷失了方向。感谢您的帮助!

更新:这是我尝试计算 cumSum 的 for 循环。这不是累积和,只是必要的回报,我仍然不确定如何将函数应用于这些范围!

xtsData <- cbind(xtsData,mySumCount=NA)
# find groups of returns
for(i in 1:nrow(xtsData)){
  if(is.na(xtsData[i,"count"]) == FALSE){
    xtsData[i,"mySumCount"] <- xtsData[i,"returns"]
  }
  else{
   xtsData[i,"mySumCount"] <- NA
  }
}

更新 2:感谢评论者!

# report returns when not NA count
x1 <- xtsData[!is.na(xtsData$count),"returns"]

# cum sum is close, but still need to exclude the first element
# -0.009 in the first series of counts and .027 in the second series of counts
x2 <- cumsum(xtsData[!is.na(xtsData$count),"returns"]) 

# this is output is not accurate because .03 is being displayed down the entire column, not just during periods when counts != NA. is this just a rounding error?
x3 <- max(xtsData[!is.na(xtsData$count),"returns"]) 

解决方案:

# function to pad a vector with a 0
lagpad <- function(x, k) {
  c(rep(0, k), x)[1 : length(x)] 
}

# group the counts
x1 <- na.omit(transform(xtsData, g =  cumsum(c(0, diff(!is.na(count)) == 1))))

# cumulative sum of the count series
z1 <- transform(x1, cumsumRet = ave(returns, g, FUN =function(x) cumsum(replace(x, 1, 0))))
# max of the count series
z2 <- transform(x1, maxRet = ave(returns, g, FUN =function(x) max(lagpad(x,1))))



 merge(xtsData,z1$cumsumRet,z2$maxRet)

【问题讨论】:

  • 谢谢@DavidArenburg,但你看到的都是硬编码的。手动输入。我需要帮助计算 maxCount 和 sumCount
  • 输入是什么?输出?如何从输入计算输出?什么定义了“计数系列”?另外,请展示您尝试过的内容。
  • @G.Grothendieck 好的,请给我一分钟时间输入我用来计算 sumCount 的 for 循环。 maxCount 我不知道。为了明确输入和输出,输入是计数数据。计数系列是计数范围,以 NA 为界。所以在上面的例子中,有 2 个计数系列,一个是 1:7,另一个是 1:2。计算的输出基于“返回”列,但仅在计数时间序列不为 NA 的期间有条件。
  • xtsData[!is.na(xtsData$count),"mySumCount2"]
  • 计算最大值:xtsData[!is.na(xtsData$count),"myMax2"]

标签: r aggregate xts dplyr


【解决方案1】:

显示的代码与图像中的输出不一致,并且没有提供解释,因此不清楚需要哪些操作;但是,问题确实提到了主要问题是区分组,因此我们将解决这个问题。

为此,我们计算一个新列 g,其行包含 1 表示第一组,2 表示第二组,依此类推。我们还删除了 NA 行,因为 g 列足以区分组。

以下代码通过首先将每个 NA 位置设置为 FALSE 并将每个非 NA 位置设置为 TRUE 来计算与 count 相同长度的向量。然后,它将该向量的每个位置与先前位置进行区分。为此,它将 FALSE 隐式转换为 0,将 TRUE 转换为 1,然后执行差分。接下来,我们将最后一个结果转换为一个逻辑向量,对于每个 1 分量为 TRUE,否则为 FALSE。由于差分向量的第一个分量没有先验位置,因此我们为此添加 0。前置操作隐式地将刚刚生成的 TRUE 和 FALSE 值分别转换为 1 和 0。取cumsum 将第一组填入 1,第二组填入 2,依此类推。最后省略 NA 行:

x <- na.omit(transform(x, g =  cumsum(c(0, diff(!is.na(count)) == 1))))

给予:

> x
           returns count maxCount sumCount g
2010-11-26  -0.009     1    0.030    0.000 1
2010-12-03   0.030     1    0.030    0.030 1
2010-12-10   0.013     2    0.030    0.042 1
2010-12-17   0.003     2    0.030    0.045 1
2010-12-24   0.010     3    0.030    0.056 1
2010-12-31   0.001     4    0.030    0.056 1
2011-01-07   0.011     5    0.030    0.067 1
2011-01-14   0.017     6    0.030    0.084 1
2011-01-21  -0.008     7    0.030    0.077 1
2011-01-28  -0.005     7    0.030    0.071 1
2011-02-04   0.027     7    0.030    0.098 1
2011-02-11   0.014     7    0.030    0.112 1
2011-02-18   0.010     7    0.030    0.123 1
2011-03-18   0.027     1    0.027    0.000 2
2011-03-25  -0.019     2    0.027   -0.019 2
attr(,"na.action")
2010-11-18 2010-11-19 2011-02-25 2011-03-04 2011-03-11 2011-03-26 2011-03-27 
         1          2         16         17         18         21         22 
attr(,"class")
[1] "omit"

您现在可以使用ave 执行您喜欢的任何计算。例如按组取累计收益:

transform(x, cumsumRet = ave(returns, g, FUN = cumsum))

cumsum 替换为适用于ave 的任何其他函数。

【讨论】:

  • 这个输出非常接近,请看我的第二次更新,感谢您向我展示data.table包!
  • 这里不使用 data.table 包。也没有提供关于如何计算这些值的解释。如果你想cumsum 每个组的回报除了设置第一个值设置为 0 然后在ave 行中将cumsum 替换为function(x) replace(cumsum(x), 1, 0)
  • 是的,对不起,我回答错了。很抱歉,如果计算不清楚。信不信由你,我们设法解决了这个问题!神奇的是您在上面提供的第一行代码。我会在第三次更新中澄清。非常感谢!!
  • 答案末尾的代码几乎肯定是错误的。它创建x1x2,它们都有g 列,随后在merge 中被忽略。没有人可以帮助您的主要原因是,即使您多次要求解释,您也没有提供。
  • 哇。我的错。你让我走上了正确的道路,我仍然需要更好地了解正在发生的事情,但我认为有一个实际的解决方案..不仅仅是忽略变量......
【解决方案2】:

啊,所以“计数”是组,您想要每组的 cumsum 和每组的最大值。我认为在 data.table 中,所以我会这样做。

library(xts)
library(data.table)

date <- as.Date(c("2010-11-18", "2010-11-19", "2010-11-26", "2010-12-03", "2010-12-10",
                  "2010-12-17", "2010-12-24", "2010-12-31", "2011-01-07", "2011-01-14",
                  "2011-01-21", "2011-01-28", "2011-02-04", "2011-02-11", "2011-02-18",
                  "2011-02-25", "2011-03-04", "2011-03-11", "2011-03-18", "2011-03-25",
                  "2011-03-26", "2011-03-27"))

returns <- c(0.002,0.000,-0.009,0.030, 0.013,0.003,0.010,0.001,0.011,0.017,
             -0.008,-0.005,0.027,0.014,0.010,-0.017,0.001,-0.013,0.027,-0.019,
             0.000,0.001)
count <- c(NA,NA,1,1,2,2,3,4,5,6,7,7,7,7,7,NA,NA,NA,1,2,NA,NA)
maxCount <- c(NA,NA,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,
              0.030,0.030,0.030,0.030,NA,NA,NA,0.027,0.027,NA,NA)
sumCount <- c(NA,NA,0.000,0.030,0.042,0.045,0.056,0.056,0.067,0.084,0.077,
              0.071,0.098,0.112,0.123,NA,NA,NA,0.000,-0.019,NA,NA)

DT<-data.table(date,returns,count)]
DT[!is.na(count),max:=max(returns),by=count]
DT[!is.na(count),cumSum:= cumsum(returns),by=count]

#if you need an xts object at the end, then.

xtsData <- xts(cbind(DT$returns,DT$count, DT$max,DT$cumSum),DT$date)

【讨论】:

    猜你喜欢
    • 2011-10-28
    • 2013-11-07
    • 2021-09-29
    • 2020-07-19
    • 1970-01-01
    • 2020-06-06
    • 2017-03-06
    • 1970-01-01
    相关资源
    最近更新 更多