【发布时间】:2017-10-28 10:12:06
【问题描述】:
我在 R 中编写了一段代码,用于计算某些数据的累积和。有用。问题是,我有 25,000 个数字 X 12 个月需要“融化”,所以我最终有 300,000 行(每个月大约会有 2000x12 多行)。前六行是重新创建我的表格样本(一个巨大的 excel 文件)。然后做了一些魔法将事物转换为正确的格式,最后我有这个双 for 循环,它根据它是否是双“PDRcount”来计算每个月的累积总和。当我在我的真实数据上尝试时,循环需要 6 小时......我怎样才能更快地做到这一点?
library(reshape2)
PDR <- (c( 1,2,3,4,5,2))
START <- as.Date(c("2008-01-01","2007-01-01","2010-01-01","2011-01-01","2017-02-01","2017-03-01"))
SWITCHOUT <- as.Date(c(NA, "2017-02-28", NA, NA, "2017-03-31",NA))
JAN17 <- (c(100,124,165,178,0,0))
FEB17 <- (c(101,125,133,178,170,0))
MAR17 <- (c(99,0,165,180,166,99))
APR17 <- (c(100,0,156,178,0,78))
alldata <- data.frame(PDR=PDR,
START=START,
SWITCHOUT=SWITCHOUT,
JAN17=JAN17,
FEB17=FEB17,
MAR17=MAR17,
APR17=APR17)
## count PDR occurrences
alldata$PDRcount <- ave(alldata$PDR,alldata$PDR,FUN=length)
alldata$PDRcount <- as.numeric(alldata$PDRcount)
crossdata<-melt(alldata,id=(c("PDR", "START","SWITCHOUT","PDRcount" )))
colnames(crossdata) <- c("PDR","START","SWITCHOUT","PDRcount","MONTH","SMC")
## transform levels to date format
levels(crossdata$MONTH)[1] <- "2017-01-01"
levels(crossdata$MONTH)[2] <- "2017-02-01"
levels(crossdata$MONTH)[3] <- "2017-03-01"
levels(crossdata$MONTH)[4] <- "2017-04-01"
crossdata$MONTH <- as.Date(crossdata$MONTH,format = "%Y-%m-%d" )
for (pdr in crossdata[,"PDR"]){
maxPDR <- max(crossdata$PDRcount[crossdata$PDR == pdr])
dates <- unique(crossdata$START[crossdata$PDR == pdr])
for (i in 1:maxPDR) {
CumSum <- cumsum( crossdata$SMC[crossdata$PDR == pdr & crossdata$START == dates[i]] )
crossdata$SMCcum[crossdata$PDR == pdr & crossdata$START == dates[i] & crossdata$MONTH == "2017-01-01"] <- CumSum[1]
crossdata$SMCcum[crossdata$PDR == pdr & crossdata$START == dates[i] & crossdata$MONTH == "2017-02-01"] <- CumSum[2]
crossdata$SMCcum[crossdata$PDR == pdr & crossdata$START == dates[i] & crossdata$MONTH == "2017-03-01"] <- CumSum[3]
crossdata$SMCcum[crossdata$PDR == pdr & crossdata$START == dates[i] & crossdata$MONTH == "2017-04-01"] <- CumSum[4]
}
}
已编辑:抱歉出现错误...
【问题讨论】:
-
为什么第一个值是NA?
-
因为客户端仍然处于活动状态,因此没有切换日期
标签: r for-loop time vectorization execution