时间序列和预测答案

【问题标题】：Time Series and Forecasting时间序列和预测
【发布时间】：2022-02-23 03:00:57
【问题描述】：

我的原始数据是日期，A:z。我需要将每个列/向量 A:z 作为独立的 ts() 时间序列。所以我可以在每个 Vector 上运行 auto.armia 和预测函数。我可以在我的全局环境中成功使用 seq_along 创建单独的 df A:Z。我现在的麻烦是遍历每个 df 并将它们转换为时间序列，然后使用 auto.armia 和预测函数遍历每个 df。最终结果应该是一个 df，对我指定的时间（1 年或 5 年）的每个 A:Z 进行点预测我想将下一个周期数的预测标准设置为变量。

    structure(list(YEAR = c(2001, 2002, 2003, 2004, 2005, 2006), 
A = c(0, 0, 0, 2003, 0, 0), B = c(0, 0, 0, 2004, 0, 0), C = c(0, 
0, 0, 2005, 0, 0), D = c(0, 0, 0, 2006, 0, 0), E = c(0, 0, 
0, 2007, 0, 0), F = c(0, 0, 0, 2008, 0, 0), G = c(0, 0, 0, 
2009, 0, 2310593.63), H = c(0, 0, 0, 2010, 0, 949885.17), 
I = c(0, 0, 0, 2011, 51939.35, 755167.32), J = c(0, 0, 0, 
2012, 200485.83, 0), K = c(0, 0, 0, 2013, 340741.25, 0), 
L = c(0, 0, 0, 2014, 692627.39, 0), M = c(0, 0, 0, 2015, 
498738.38, 13228.06), N = c(0, 0, 0, 2016, 727855.33, 151441.77
), O = c(0, 0, 0, 2017, 1197076.02, 108188.58), P = c(0, 
0, 0, 2018, 558267.98, 0), Q = c(0, 0, 0, 2019, 631624.18, 
0), R = c(0, 0, 0, 2020, 1348869.22, 0), S = c(0, 0, 0, 2021, 
1206861.95, 0), T = c(0, 0, 0, 2022, 0, 0), U = c(0, 0, 0, 
2023, 0, 0), V = c(0, 0, 0, 2024, 0, 0), W = c("0", "0", 
"0", "Grand Total", "7455086.88", "4288504.53"), X = c(0, 
0, 0, 2011, 51939.35, 755167.32), Y = c(0, 0, 0, 2012, 200485.83, 
0), Z = c(0, 0, 0, 2013, 340741.25, 0)), row.names = c(NA,

6L), class= "data.frame")

【问题讨论】：

这里如何上传原始数据？
你能粘贴dput(head(raw_data))的输出吗？另见stackoverflow.com/questions/5963269/…
我回家后会这样做。
用 dput(head(Raw_Data)) 更新
对不起，我完全不明白这个结构或问题。 W 列有字符串值"0","0"."0","Grand Total", "7455086.88", "4288504.53" 那是什么意思？，第 1、2 和 3 行全为零。第 4 行看起来像是数字（年？）然后是文本（“总计”）和更多数字的组合。抱歉，您的问题需要更具体

标签： r time time-series arima

【解决方案1】：

以下是为每列生成单独时间序列的一些选项：

如果您上面的结构是data，那么您可以执行以下操作

选项1：设置为data.table并将`ts()`应用于A-Z每一列

library(data.table)
setDT(data)
dt_as_ts = data[, lapply(.SD, ts, start=2001, end=2006), .SDcols=c(2:27)]

这将返回一个data.table，其中每一列都是"ts" 类的对象。输出：

      A    B    C    D    E    F       G        H         I        J        K        L         M        N         O      P        Q       R       S    T    U    V           W         X        Y        Z
   <ts> <ts> <ts> <ts> <ts> <ts>    <ts>     <ts>      <ts>     <ts>     <ts>     <ts>      <ts>     <ts>      <ts>   <ts>     <ts>    <ts>    <ts> <ts> <ts> <ts>        <ts>      <ts>     <ts>     <ts>
1:    0    0    0    0    0    0       0      0.0      0.00      0.0      0.0      0.0      0.00      0.0       0.0      0      0.0       0       0    0    0    0           0      0.00      0.0      0.0
2:    0    0    0    0    0    0       0      0.0      0.00      0.0      0.0      0.0      0.00      0.0       0.0      0      0.0       0       0    0    0    0           0      0.00      0.0      0.0
3:    0    0    0    0    0    0       0      0.0      0.00      0.0      0.0      0.0      0.00      0.0       0.0      0      0.0       0       0    0    0    0           0      0.00      0.0      0.0
4: 2003 2004 2005 2006 2007 2008    2009   2010.0   2011.00   2012.0   2013.0   2014.0   2015.00   2016.0    2017.0   2018   2019.0    2020    2021 2022 2023 2024 Grand Total   2011.00   2012.0   2013.0
5:    0    0    0    0    0    0       0      0.0  51939.35 200485.8 340741.2 692627.4 498738.38 727855.3 1197076.0 558268 631624.2 1348869 1206862    0    0    0  7455086.88  51939.35 200485.8 340741.2
6:    0    0    0    0    0    0 2310594 949885.2 755167.32      0.0      0.0      0.0  13228.06 151441.8  108188.6      0      0.0       0       0    0    0    0  4288504.53 755167.32      0.0      0.0

您可以预测其中每一个的下一个值（W 除外，它不是数字），如下所示：

t(dt_of_ts[,lapply(.SD, function(x) predict(arima(x))), .SDcols=-23])

选项 2：只需将`ts()` 直接应用于感兴趣的列

或者，您可以像这样将整个数据作为矩阵提供给ts()，不包括第一列，年份

data_as_ts=ts(data[,-1], start=2001, end=2006)
data_as_ts
Time Series:
Start = 2001 
End = 2006 
Frequency = 1 
        A    B    C    D    E    F       G        H         I        J        K        L         M        N         O      P        Q       R       S    T    U    V W         X        Y        Z
2001    0    0    0    0    0    0       0      0.0      0.00      0.0      0.0      0.0      0.00      0.0       0.0      0      0.0       0       0    0    0    0 1      0.00      0.0      0.0
2002    0    0    0    0    0    0       0      0.0      0.00      0.0      0.0      0.0      0.00      0.0       0.0      0      0.0       0       0    0    0    0 1      0.00      0.0      0.0
2003    0    0    0    0    0    0       0      0.0      0.00      0.0      0.0      0.0      0.00      0.0       0.0      0      0.0       0       0    0    0    0 1      0.00      0.0      0.0
2004 2003 2004 2005 2006 2007 2008    2009   2010.0   2011.00   2012.0   2013.0   2014.0   2015.00   2016.0    2017.0   2018   2019.0    2020    2021 2022 2023 2024 4   2011.00   2012.0   2013.0
2005    0    0    0    0    0    0       0      0.0  51939.35 200485.8 340741.2 692627.4 498738.38 727855.3 1197076.0 558268 631624.2 1348869 1206862    0    0    0 3  51939.35 200485.8 340741.2
2006    0    0    0    0    0    0 2310594 949885.2 755167.32      0.0      0.0      0.0  13228.06 151441.8  108188.6      0      0.0       0       0    0    0    0 2 755167.32      0.0      0.0

这将返回一个类对象："mts" "ts" "matrix"，并且每一列都是一个类“ts”的对象。例如，class(data_as_ts[,4]) 返回“ts”

仔细注意 W 列是如何转换为数字的。

您可以使用apply 来获取每列的预测

apply(data_as_ts,2,function(x) predict(arima(x)))

选项 3：将`data` 拆分为单独的帧，并使用`lapply()` 返回`ts` 对象列表

最后，如果您想按列拆分框架，并拥有一个单独的 ts 对象列表，您可以这样做：

list_of_ts = lapply(split(melt(setDT(data)[,!c("W")], id="YEAR"), by="variable"), 
       function(x) ts(x$value, start=2001, end=2006)
)

输出（前三个元素）

$A
Time Series:
Start = 2001 
End = 2006 
Frequency = 1 
[1]    0    0    0 2003    0    0

$B
Time Series:
Start = 2001 
End = 2006 
Frequency = 1 
[1]    0    0    0 2004    0    0

$C
Time Series:
Start = 2001 
End = 2006 
Frequency = 1 
[1]    0    0    0 2005    0    0

同样，您可以使用lapply 来获取列表中每个项目的下一个预测

lapply(list_of_ts,function(x) predict(arima(x)))

【讨论】：

这适用于每列上的 ts() 时间序列，但现在如何应用 auto.armia 以便我可以预测此示例中每列或向量的下一个值将是下一个 A:Z？
您可能想提出一个新问题 - 如何将 arima 应用于时间序列对象列表？
@KyleOverton，只使用lapply(list_of_ts,function(x) predict(arima(x))).. 这将返回 2007 年的估计值......对于 26 个时间序列中的每一个......
arima(x) 中的错误：仅针对单变量时间序列实现
您只显示了单变量时间序列（我在答案中定义的 list_of_ts 是单变量时间序列列表） - 这就是我没有收到该错误的原因。

选项1：设置为data.table并将ts()应用于A-Z每一列

选项 2：只需将ts() 直接应用于感兴趣的列

选项 3：将data 拆分为单独的帧，并使用lapply() 返回ts 对象列表

选项1：设置为data.table并将`ts()`应用于A-Z每一列

选项 2：只需将`ts()` 直接应用于感兴趣的列

选项 3：将`data` 拆分为单独的帧，并使用`lapply()` 返回`ts` 对象列表