【问题标题】:Loop Auto Arima by Partition按分区循环自动 Arima
【发布时间】:2021-02-28 02:32:55
【问题描述】:

我正在合作开展一个项目,该项目需要我使用 R,但迄今为止我还没有任何经验。我正在尝试将自动 arima 应用于我的数据集中的分区/窗口,但我什至不知道如何开始。3

基本上,我想使用行 c_id = "none" 在每个 partner_id 上训练一个单独的模型,然后预测/预测每个 partner_id 的最大值(日期)。每个合作伙伴的月数/行数长度不同。对于下面粘贴的这个示例数据框,partner_id = "1A9" 有 12 个月/行,c_id = "none" 而 partner_id = "1B9" 有 13 个月/行,c_id = "none"。每个 partner_is 中扩展到 max(Date) 的月数/行数也各不相同。这很棘手,因为我假设我需要为每个 partner_id 动态输入要训练的月数/行数以及要预测的月数/行数。

我在下面包含了一个示例数据集。

x <- data.frame("c_id" = c("none","none","none","none","none",
"none","none","none","none","none","none","none","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101", "none","none","none","none","none","none","none","none","none","none","none","none","none","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111"), "partner_id" = c("1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9"), "rev_month" = as.Date(c("2016-01-01","2016-01-01","2016-02-01","2016-03-01","2016-04-01","2016-05-01","2016-06-01","2016-07-01","2016-08-01", "2016-09-01","2016-10-01","2016-11-01","2016-12-01","2017-01-01","2017-02-01","2017-03-01","2017-04-01","2017-05-01","2017-06-01","2017-07-01","2017-08-01","2017-09-01","2017-10-01","2017-11-01","2017-12-01","2018-01-01","2018-02-01","2018-03-01","2018-04-01","2018-05-01","2018-06-01","2018-07-01","2018-08-01","2018-09-01","2018-10-01","2018-11-01","2018-12-01", "2017-01-01","2017-01-01","2017-02-01","2017-03-01","2017-04-01","2017-05-01","2017-06-01","2017-07-01","2017-08-01", "2017-09-01","2017-10-01","2017-11-01","2017-12-01","2018-01-01","2018-02-01","2018-03-01","2018-04-01","2018-05-01","2018-06-01","2018-07-01","2018-08-01","2018-09-01","2018-10-01","2018-11-01","2018-12-01","2019-01-01","2019-02-01","2019-03-01","2019-04-01","2019-05-01","2019-06-01","2019-07-01","2019-08-01","2019-09-01","2019-10-01","2019-11-01","2019-12-01", "2020-01-01", "2020-02-01", "2020-03-01")), "rev" = c(101.25, 102.25, 103.50, 103.75, 104.15, 104.25, 104.3, 105.00, 105.20, 105.60, 106.00, 106.10, 106.50, 101.50, 100.30, 107.50, 108.30, 108.45, 109.10, 110.10, 112.15, 112.45, 114.65, 115.00, 116.00, 116.50, 117.25, 117.85, 119.25, 119.95, 120.20, 121.50, 122.30, 122.40, 123.25, 123.75, 124.00, 101.25, 102.25, 103.50, 103.75, 104.15, 104.25, 104.3, 105.00, 105.20, 105.60, 106.00, 106.10, 106.50, 101.50, 100.30, 107.50, 108.30, 108.45, 109.10, 110.10, 112.15, 112.45, 114.65, 115.00, 116.00, 116.50, 117.25, 117.85, 119.25, 119.95, 120.20, 121.50, 122.30, 122.40, 123.25, 123.75, 124.00, 124.10, 125.35, 125.45), stingsAsFactors=FALSE)

对于目前还没有任何代码起始代码,我深表歉意,因为我仍然在尝试从概念上考虑这一点,而对 R 没有太多经验。最终,我想将预测和置信区间列添加回我的原始数据框。我愿意接受任何 R 和/或 Python 解决方案。

【问题讨论】:

    标签: python r for-loop arima


    【解决方案1】:

    从关于 R 和时间序列的编程角度来看,我的回答在很多层面上都是错误的。主要方面是(还有其他问题,但我知道您的担心是让它尽快工作):

    1. 首先应该避免循环 - 但我的猜测是,矢量化解决方案会让您更难理解

    2. 如果您希望了解季节性模式,则将 arima 用于至少没有两个完整周期(在本例中为年)的时间序列并不是很有希望。

    如果您真的对 R 中的时间序列预测主题感兴趣,请阅读这本书:https://otexts.com/fpp2/

    一个相关的附带问题是您的测试数据:合作伙伴的两个系列在第一个和第二个位置都有重复的日期,这与固定周期/间隔的时间序列预测不符 - 我只是落后于第一个以使事情正常进行.因此新的训练数据是这样的(我们不需要stringAsFactores=FALSE):

     x <- data.frame(c_id = c("none","none","none","none","none","none","none","none","none","none","none","none","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101", "none","none","none","none","none","none","none","none","none","none","none","none","none","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111"), "partner_id" = c("1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9"),
                    rev_month = as.Date(c("2015-12-01","2016-01-01","2016-02-01","2016-03-01","2016-04-01","2016-05-01","2016-06-01","2016-07-01","2016-08-01", "2016-09-01","2016-10-01","2016-11-01","2016-12-01","2017-01-01","2017-02-01","2017-03-01","2017-04-01","2017-05-01","2017-06-01","2017-07-01","2017-08-01","2017-09-01","2017-10-01","2017-11-01","2017-12-01","2018-01-01","2018-02-01","2018-03-01","2018-04-01","2018-05-01","2018-06-01","2018-07-01","2018-08-01","2018-09-01","2018-10-01","2018-11-01","2018-12-01", "2016-12-31","2017-01-01","2017-02-01","2017-03-01","2017-04-01","2017-05-01","2017-06-01","2017-07-01","2017-08-01", "2017-09-01","2017-10-01","2017-11-01","2017-12-01","2018-01-01","2018-02-01","2018-03-01","2018-04-01","2018-05-01","2018-06-01","2018-07-01","2018-08-01","2018-09-01","2018-10-01","2018-11-01","2018-12-01","2019-01-01","2019-02-01","2019-03-01","2019-04-01","2019-05-01","2019-06-01","2019-07-01","2019-08-01","2019-09-01","2019-10-01","2019-11-01","2019-12-01", "2020-01-01", "2020-02-01", "2020-03-01")),
                    rev = c(101.25, 102.25, 103.50, 103.75, 104.15, 104.25, 104.3, 105.00, 105.20, 105.60, 106.00, 106.10, 106.50, 101.50, 100.30, 107.50, 108.30, 108.45, 109.10, 110.10, 112.15, 112.45, 114.65, 115.00, 116.00, 116.50, 117.25, 117.85, 119.25, 119.95, 120.20, 121.50, 122.30, 122.40, 123.25, 123.75, 124.00, 101.25, 102.25, 103.50, 103.75, 104.15, 104.25, 104.3, 105.00, 105.20, 105.60, 106.00, 106.10, 106.50, 101.50, 100.30, 107.50, 108.30, 108.45, 109.10, 110.10, 112.15, 112.45, 114.65, 115.00, 116.00, 116.50, 117.25, 117.85, 119.25, 119.95, 120.20, 121.50, 122.30, 122.40, 123.25, 123.75, 124.00, 124.10, 125.35, 125.45))
    

    现在我们设置了一个 data.frame 来存储预测 - 虽然这在理论上是不正确的(“永远不要增长向量”)并且有更好的解决方案但是它会使它变得更加复杂并且无助于理解实现:

    # empty data.frame to fill in predictions
    predictions_df <- data.frame(c_id=character(),
                                 partner=character(),
                                 rev_month = character(),
                                 rev=double())
    

    现在我们构建一个包含唯一合作伙伴的向量来循环:

    # unique partners
    partners <- unique(x$partner_id)
    

    让我们调用本练习所需的库:

    library(xts)
    library(dplyr)
    library(forecast)
    

    主要部分是循环本身:

    # loop to build predictions and store them
    for (i in 1:length(partners)){
    
      partner <- partners[i] # get specific partner
      x1 <- x[x$partner_id == partner, ] # get data for specific partner
      x1_t <- x1[x1$c_id == "none", c(3,4)] # training data
      x1_f <- x1[x1$c_id != "none", c(3,4)] # forecast data
      c_id <- x1[x1$c_id != "none", 1] # complementary data
    
      # convert training data to time-series object
      x1_t_ts <- xts(x1_t[,-1], order.by=as.Date(x1_t[,1], "%Y/%m/%d"))
      # run auto arima on the time series
      tm <- forecast::auto.arima(x1_t_ts)
      # forecast the number of future steps (rows for to predict data)
      fc <- forecast::forecast(tm, nrow(x1_f))
    
      predictions_df <- rbind(predictions_df, data.frame(c_id, partner, rev_month = as.character(x1_f$rev_month), rev = as.double(fc$mean)))
    
    }
    

    最后让我们看看结果:

    predictions_df
    
        c_id partner  rev_month      rev
    1  c-100     1A9 2016-12-01 106.5409
    2  c-100     1A9 2017-01-01 106.9818
    3  c-100     1A9 2017-02-01 107.4227
    4  c-100     1A9 2017-03-01 107.8636
    5  c-100     1A9 2017-04-01 108.3045
    6  c-100     1A9 2017-05-01 108.7455
    7  c-100     1A9 2017-06-01 109.1864
    8  c-100     1A9 2017-07-01 109.6273
    9  c-100     1A9 2017-08-01 110.0682
    10 c-100     1A9 2017-09-01 110.5091
    11 c-100     1A9 2017-10-01 110.9500
    12 c-100     1A9 2017-11-01 111.3909
    13 c-101     1A9 2017-12-01 111.8318
    14 c-101     1A9 2018-01-01 112.2727
    15 c-101     1A9 2018-02-01 112.7136
    16 c-101     1A9 2018-03-01 113.1545
    17 c-101     1A9 2018-04-01 113.5955
    18 c-101     1A9 2018-05-01 114.0364
    19 c-101     1A9 2018-06-01 114.4773
    20 c-101     1A9 2018-07-01 114.9182
    21 c-101     1A9 2018-08-01 115.3591
    22 c-101     1A9 2018-09-01 115.8000
    23 c-101     1A9 2018-10-01 116.2409
    24 c-101     1A9 2018-11-01 116.6818
    25 c-101     1A9 2018-12-01 117.1227
    26 c-110     1B9 2018-01-01 106.9375
    27 c-110     1B9 2018-02-01 107.3750
    28 c-110     1B9 2018-03-01 107.8125
    29 c-110     1B9 2018-04-01 108.2500
    30 c-110     1B9 2018-05-01 108.6875
    31 c-110     1B9 2018-06-01 109.1250
    32 c-110     1B9 2018-07-01 109.5625
    33 c-110     1B9 2018-08-01 110.0000
    34 c-110     1B9 2018-09-01 110.4375
    35 c-110     1B9 2018-10-01 110.8750
    36 c-110     1B9 2018-11-01 111.3125
    37 c-110     1B9 2018-12-01 111.7500
    38 c-111     1B9 2019-01-01 112.1875
    39 c-111     1B9 2019-02-01 112.6250
    40 c-111     1B9 2019-03-01 113.0625
    41 c-111     1B9 2019-04-01 113.5000
    42 c-111     1B9 2019-05-01 113.9375
    43 c-111     1B9 2019-06-01 114.3750
    44 c-111     1B9 2019-07-01 114.8125
    45 c-111     1B9 2019-08-01 115.2500
    46 c-111     1B9 2019-09-01 115.6875
    47 c-111     1B9 2019-10-01 116.1250
    48 c-111     1B9 2019-11-01 116.5625
    49 c-111     1B9 2019-12-01 117.0000
    50 c-111     1B9 2020-01-01 117.4375
    51 c-111     1B9 2020-02-01 117.8750
    52 c-111     1B9 2020-03-01 118.3125
    

    如果您想获得置信区间等,请解构循环(仅使用“i

    【讨论】:

    • @DPM。这很有帮助。谢谢你。出于某种原因,我遇到了问题
    • @bbal20 有什么问题(猜你的评论被删了)
    • @DPM。这很有帮助。谢谢你。出于某种原因,我遇到了 'partners
    • @DPM x1
    • @DPM 我在运行 x1_t_ts 时收到错误“as.Date.default(x, ...) 中的错误:不知道如何将“x”转换为“日期”类
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-08-06
    • 1970-01-01
    • 2015-11-01
    • 1970-01-01
    • 2015-02-15
    • 2022-11-27
    相关资源
    最近更新 更多