并行训练多个 Auto.Arima 模型答案

【问题标题】：Training Multiple Auto.Arima Models in Parallel并行训练多个 Auto.Arima 模型
【发布时间】：2016-11-11 01:20:50
【问题描述】：

在下面的代码中，我试图在不同的内核上同时训练两个不同的 auto.arima 模型。当我尝试运行代码时出现以下错误。我不确定我的问题是 do.call 还是 parLapply，而且我对并行处理还很陌生，所以任何提示都非常有用。

Code:
library("forecast")
library("parallel")

TList2<-list(x=tsd1, lambda = Tlambda, stepwise=TRUE, approximation = TRUE)
DList2<-list(x=tsd2, lambda = Rlambda, stepwise=TRUE, approximation = TRUE)

##Parallelizing ARIMA Model Training

# Calculate the number of cores
no_cores <- 1

# Initiate cluster
cl <- makeCluster(no_cores)

ARIMA_List<-list(TList2,DList2)

ARIMA_Models<-parLapply(cl, ARIMA_List,
                    function(x){do.call(auto.arima, args=x)})   

stopCluster(cl)


Error:
Error in checkForRemoteErrors(val) : 
  one node produced an error: object 'auto.arima' not found

Data:

dput(TList2)
structure(list(x = c(6, 15.5, 22, 16, NA, NA, 13, 13.5, 10, 6, 
14.5, 16, NA, 8, 11, NA, 2, 2, 10, NA, 9, NA, 11, 16, NA, 4, 
17, 7, 11.5, 22, 20.5, 10, 22, NA, 13, 17, 22, 9, 13, 19, 8, 
16, 18, 22, 21, 14, 7, 20, 21.5, 17), lambda = 0.999958829041611, 
    stepwise = TRUE, approximation = TRUE), .Names = c("x", "lambda", 
"stepwise", "approximation"))

dput(DList2)
structure(list(x = c(11, 4, 8, 11, 11, NA, 3, 2.5, 6, 11, 7, 
1, NA, 6, 6, NA, 6, 11, 3, NA, 11, NA, 10, 10, NA, NA, 9, 3, 
3, 11, 8, 10, NA, NA, 11, 10, 9, 3, 7, NA, 2, 4, 11, 2.5, 3, 
NA, 4, 7, 1, 5), lambda = 0.170065851742339, stepwise = TRUE, 
    approximation = TRUE), .Names = c("x", "lambda", "stepwise", 
"approximation"))

【问题讨论】：

标签： r parallel-processing time-series forecasting

【解决方案1】：

我认为forecast::auto.arima 也应该在集群上可用，因此请尝试像这样使用clusterEvalQ：

TList2 <- structure(list(x = c(6, 15.5, 22, 16, NA, NA, 13, 13.5, 10, 6, 
14.5, 16, NA, 8, 11, NA, 2, 2, 10, NA, 9, NA, 11, 16, NA, 4, 
17, 7, 11.5, 22, 20.5, 10, 22, NA, 13, 17, 22, 9, 13, 19, 8, 
16, 18, 22, 21, 14, 7, 20, 21.5, 17), lambda = 0.999958829041611, 
    stepwise = TRUE, approximation = TRUE), .Names = c("x", "lambda", 
"stepwise", "approximation"))

DList2<- structure(list(x = c(11, 4, 8, 11, 11, NA, 3, 2.5, 6, 11, 7, 
1, NA, 6, 6, NA, 6, 11, 3, NA, 11, NA, 10, 10, NA, NA, 9, 3, 
3, 11, 8, 10, NA, NA, 11, 10, 9, 3, 7, NA, 2, 4, 11, 2.5, 3, 
NA, 4, 7, 1, 5), lambda = 0.170065851742339, stepwise = TRUE, 
    approximation = TRUE), .Names = c("x", "lambda", "stepwise", 
"approximation"))

library("forecast")
library("parallel")
cl <- makeCluster(no_cores)
clusterEvalQ(cl, library(forecast))
ARIMA_List<-list(TList2,DList2)
ARIMA_Models<-parLapply(cl, ARIMA_List,
                    function(x){do.call(auto.arima, args=x)})   
stopCluster(cl)

【讨论】：

谢谢你，成功了！我对并行处理很陌生，代码是在 2 个不同的内核上同时训练两个模型吗？另外添加 clusterEvalQ(cl, library(forecast)) 有什么作用？
我对并行计算也很陌生。我为this 帖子添加了书签，这或多或少与您的重复。直观地说，如果你只使用 1 个核心，它并没有什么不同。但是，如果您注册更多，它应该会如您所愿 - 并行运行 arima。