使用 plyr（或 *apply）计算累积 rtns答案

【问题标题】：Using plyr (or *apply) to calc cumulative rtns使用 plyr（或 *apply）计算累积 rtns
【发布时间】：2013-10-01 00:01:40
【问题描述】：

我在这个似乎适合 plyr 或 *apply 的问题上苦苦挣扎了好几个小时。有人能指出一个比我在下面列出的解决方案更简洁的 R 解决方案吗？

Bkdg：我查看了许多与 R 金融相关的软件包，但我找不到一个流行的软件包可以很好地处理在时间序列中间消失的证券以及以编程方式变化的权重。我正在为这个特定问题构建自己的解决方案，但我更愿意使用现有的解决方案。

问题的症结在于我想使用 plyr 在日期子集中的证券列表上“循环”。一些证券在该日期范围内消失。（我使用来自没有幸存者偏差的数据的正向 rtns。）我希望每个日期范围的输出成为所选证券累积回报的数据框架。我可以使用它（连同初始权重）结合其他日期范围来计算各种投资组合指标。

我从选定证券的（玩具）数据框开始，如下所示（日期、股票代码、净回报）：

d                t    r
1 2013-03-31   ibm 0.01
2 2013-03-31  appl 0.02
3 2013-03-31 loser 0.03
4 2013-04-30   ibm 0.04
5 2013-04-30  appl 0.05
6 2013-04-30 loser 0.06
7 2013-05-31   ibm 0.07
8 2013-05-31  appl 0.08

请注意，安全“失败者”不存在于日期范围的最后一个月。（证券不会重新出现。）这里有一些代码可以创建玩具数据框和看似可行的笨拙解决方案。

library(plyr)
#Create data frame for the example code
dt <- as.Date("20130331","%Y%m%d")
mydf <- data.frame(d=dt,t="ibm",r=0.01)
mydf <- rbind(mydf,data.frame(d=dt,t="appl",r=0.02))
mydf <- rbind(mydf,data.frame(d=dt,t="loser",r=0.03))
dt <- as.Date("20130430","%Y%m%d")
mydf <- rbind(mydf,data.frame(d=dt,t="ibm",r=0.04))
mydf <- rbind(mydf,data.frame(d=dt,t="appl",r=0.05))
mydf <- rbind(mydf,data.frame(d=dt,t="loser",r=0.06))
dt <- as.Date("20130531","%Y%m%d")
mydf <- rbind(mydf,data.frame(d=dt,t="ibm",r=0.07))
mydf <- rbind(mydf,data.frame(d=dt,t="appl",r=0.08))
#Note that there is no row for "loser" for 2013-05-31

#This plyr call crashes because "loser" doesn't have the same 
#   num of rtns as the others
#newdf <- ddply(mydf,.(t),function(x) cumprod(x[,"r"]+1)-1)

list_to_dataframe 中的错误（res, attr(.data, "split_labels"))： 结果的长度不同

#I work with intermediate lists as a workaround
tmp.list <- dlply(mydf,.(t),function(x) cumprod(x[,"r"]+1)-1)

#Get the longest of any of the resulting lists (tmp = 3 in this example)
tmp <- max(as.numeric(lapply(tmp.list,length))) 

#Define function to extend cumulative rtn for missing values
#   By holding cumulative rtn constant, its as if
#   I hold cash when a security disappears
extendit <- function(x) if(length(x)<tmp){ 
  c(x,rep(x[length(x)],tmp-length(x)))
} else {x}

#Use plyr to make all lists the same length
tmp2.list<-llply(tmp.list,extendit)

#Use plyr to create the data table I wanted
cusipcumrtns.df <- ldply(tmp2.list)          

#Must name key column since it got lost in the process
colnames(cusipcumrtns.df)[1] <- "t"

上面的代码生成以下数据框，其中包含每种证券的累积收益。

t         V1     V2       V3
1   ibm 0.01 0.0504 0.123928
2  appl 0.02 0.0710 0.156680
3 loser 0.03 0.0918 0.091800

非常感谢任何有关更完善的解决方案的指针。这似乎可行，但我正在努力学习如何更好地使用 R。

【问题讨论】：

您的最终结果相当于说loser 在其缺失期间的回报为零（不赢不输）。这真的是你想要的吗？
@flodel：是的，我想是的。由于我使用的是没有幸存者偏差的数据集，因此我假设上一时期的远期 rtn 包含因破产或收购或其他原因而产生的任何回报。我没有将收益再投资于其他证券，而是持有现金。我不确定这是否是最好的做法，但这是我的本意。我欢迎您的意见。

标签： r plyr

【解决方案1】：

按照此处的解决方案：https://stackoverflow.com/a/9996566/1201032，您可以将缺失的行添加到您的数据中：

keys.df <- expand.grid(d = unique(mydf$d),
                       t = unique(mydf$t))
full.df <- merge(keys.df, mydf, all.x = TRUE)

然后使用您最初的想法，但确保将丢失的返回值变为零（正如我们评论的那样）：

ddply(full.df, .(t), function(x) cumprod(ifelse(is.na(x$r), 0, x$r) + 1) - 1)
      t   V1     V2       V3
1   ibm 0.01 0.0504 0.123928
2  appl 0.02 0.0710 0.156680
3 loser 0.03 0.0918 0.091800

您也可以考虑将输出保持为长格式：

ddply(full.df,.(t), transform, cum.r = cumprod(ifelse(is.na(r), 0, r) + 1) - 1)

【讨论】：

好多了。现在我必须研究几分钟！谢谢！