R::forecast 二因素预测答案

【问题标题】：R::forecast Two-factor forecastR::forecast 二因素预测
【发布时间】：2017-11-18 12:55:49
【问题描述】：

我需要根据产品线和商场线进行预测。我的数据集的一小部分。

date        mall    product price
01.01.2017  mall1   prod1   94
01.01.2017  mall1   prod1   65
01.01.2017  mall1   prod1   50
01.01.2017  mall1   prod1   92
01.01.2017  mall1   prod2   97
01.01.2017  mall1   prod2   80
01.01.2017  mall1   prod2   51
01.01.2017  mall1   prod2   90
01.01.2017  mall1   prod3   52
01.01.2017  mall1   prod3   73
01.01.2017  mall1   prod3   59
01.01.2017  mall1   prod3   85
01.01.2017  mall2   prod1   56
01.01.2017  mall2   prod1   60
01.01.2017  mall2   prod1   89
01.01.2017  mall2   prod1   87
01.01.2017  mall2   prod2   77
01.01.2017  mall2   prod2   79
01.01.2017  mall2   prod2   99
01.01.2017  mall2   prod2   59
01.01.2017  mall2   prod3   98
01.01.2017  mall2   prod3   50
01.01.2017  mall2   prod3   54
01.01.2017  mall2   prod3   98
02.01.2017  mall1   prod1   60
02.01.2017  mall1   prod1   68
02.01.2017  mall1   prod1   65
02.01.2017  mall1   prod1   81
02.01.2017  mall1   prod2   74
02.01.2017  mall1   prod2   63
02.01.2017  mall1   prod2   88
02.01.2017  mall1   prod2   71
02.01.2017  mall1   prod3   67
02.01.2017  mall1   prod3   73
02.01.2017  mall1   prod3   62
02.01.2017  mall1   prod3   57
02.01.2017  mall2   prod1   51
02.01.2017  mall2   prod1   65
02.01.2017  mall2   prod1   100
02.01.2017  mall2   prod1   67
02.01.2017  mall2   prod2   74
02.01.2017  mall2   prod2   70
02.01.2017  mall2   prod2   60
02.01.2017  mall2   prod2   97
02.01.2017  mall2   prod3   90
02.01.2017  mall2   prod3   100
02.01.2017  mall2   prod3   72
02.01.2017  mall2   prod3   50

对于每个商场的每种产品，我需要提前两天做预测。我在搜索 R 库时找到了这个论坛并找到了 library::forecast，带有ets 函数。那么如何编写对每个商场的每个产品执行预测的循环或函数。理想情况下，输出必须是这样的

date        mall    product price
03.01.2017  mall1   prod1   pred.value
03.01.2017  mall1   prod2   pred.value
03.01.2017  mall1   prod3   pred.value
03.01.2017  mall1   prod4   pred.value
03.01.2017  mall2   prod1   pred.value
03.01.2017  mall2   prod2   pred.value
03.01.2017  mall2   prod3   pred.value
03.01.2017  mall2   prod4   pred.value
04.01.2017  mall1   prod1   pred.value
04.01.2017  mall1   prod2   pred.value
04.01.2017  mall1   prod3   pred.value
04.01.2017  mall1   prod4   pred.value
04.01.2017  mall2   prod1   pred.value
04.01.2017  mall2   prod2   pred.value
04.01.2017  mall2   prod3   pred.value
04.01.2017  mall2   prod4   pred.value

任何帮助都是有价值的。

【问题讨论】：

你的训练集是多长时间？直到你想预测多长时间？你的问题不清楚
@DataTx，你为什么认为我的问题不清楚，我明明写了这是每日数据，而predictions(Y)是提前2天，这是一段数据放。你具体有什么不明白的:)

标签： r forecasting

【解决方案1】：

基本上，您是提前两天预测（产品数量）x（商场数量）变量。您的所有数据仅限于每种产品、每个商场、每天的产品价格。

您需要做的第一件事是指定一组预测模型，您将以某种方式进行比较以确定您将如何生成预测。您可以使用 ARIMA 类型的模型或非参数方法（例如支持向量回归）将当前价格与过去价格关联起来。

假设您想使用 ARIMA 类型的模型并想比较 ARMA(1,1) 和 AR(2) 模型。这个想法是在最后选择数据集的一部分。假设您保留数据集的最后 20%。您将前 80% 减去最后两天，您估计该数据的 AR(2) 和 ARMA(1,1)。然后，您使用它来预测您遗漏的 20% 的第一天。然后，您将窗口的末端移动一天。如果您想始终在相同数量的数据点上进行估计，您也可以丢弃第一个观察值。您再次估计所有模型并生成第二个预测。您为所有模型生成所有这些预测。

然后，由于您知道实现了哪些值，您可以计算数据集最后 20% 的每个模型的提前 2 天预测误差。您可以测量均方误差、平均绝对误差、正确符号预测的百分比、落在预测值周围区间内的误差百分比，就像您可以使用样本外生成各种其他统计性能度量一样那些错误。每个此类统计数据都将帮助您对所有模型进行排名——如果您有很多统计数据，您可以根据需要使用蜘蛛图来可视化模型的执行情况。

现在，您如何编写代码？我模拟数据并提供种子，以便您了解每个部分的工作原理。基本上，您选择一个子样本，然后为每个模型估计模型、预测和收集该子样本的错误。如果你想让事情变得更复杂，你可以在循环中添加另一个层来遍历许多 AR(p) 和 ARMA(p,q) 模型，收集比如说 BIC 值，并将预测生成为最小 BIC 值。您也可以编写 AR 模型的最小二乘估计，而不是生成迭代预测（“预测”使用 ARIMA 模型的结构通过递归方程生成预测），您可以生成直接预测。直接预测意味着您的开始滞后于预测范围——在这里，您将有 y_{t+2} = constant + phi_1 y_t + ... + phi_p y_{t-p} + e_{t+h}，所以您跳过 y_{t+1}。

AR 模型的直接预测往往表现更好。至于 ARMA，我不建议使用 p,q > 1 进行预测。 ARMA(1,1) 是无限 MA 和 AR 的一阶近似，因此它确实捕获了复杂（但线性）的响应。显然，如果需要，您可以使用像“e1071”这样的包并训练支持向量机。它带有一个调整函数来调整超参数和内核参数，以及二次采样和预测函数来做出选择和产生预测——而且，在整个代码范围内，它并不比你在下面看到的更复杂。

而且，如果你没有考虑过，一旦你有几个预测模型，你可以使用预测的平均值、预测的中位数或预测的优化凸组合作为预测模型——这往往是最好的，一旦你有几个模型可以比较，它就不会更难或更长。

library(forecast)

set.seed(1030)
e <- rnorm(n=1000, sd=1, mean=0)  # Create errors for simulation
y <- array(data=0, dim=c(1000,1)) # Create vector to hold values
phi <- 0.8

# Simulate an AR(1) process
for (i in 2:length(y)){
  y[i,1] <- phi*y[i-1,1] + e[i]
}

# Now, we'll use only  the last half of the sample. It doesn't matter that
# we started at 0 because an AR(1) procees with abs(phi) < 1 is ergodic and
# stationnary.
y <- y[501:1000,1]

# Now we have data, we can estimate a model and produce an out-of-sample
# exercise:
poos <- c(250:length(y))                      # We use the last half
forecast_ar <- array(NA, dim=c(length(poos))) # Same size as poos
forecast_arma <- forecast_ar
error <- forecast_ar
error_arma <- error

for (i in poos){
  # AR model
  a <- Arima(y = y[1:(i-2)],          # Horizon = 2 periods
             order = c(1,0,0),
             seasonal = c(0,0,0),
             include.constant = TRUE) # We estimate an AR(1) model
  forecast_ar[i] <- forecast(a, h=2)$mean[2]
  error[i] <- y[i] - forecast_ar[i]

  # ARMA model
  a <- Arima(y = y[1:(i-2)],          # Horizon = 2 periods
             order = c(1,0,1),
             seasonal = c(0,0,0),
             include.constant = TRUE) # We estimate an ARMA(1,1) model
  forecast_arma[i] <- forecast(a, h=2)$mean[2]
  error_arma[i] <- forecast_arma [i]
}

【讨论】：