【发布时间】:2021-02-28 02:32:55
【问题描述】:
我正在合作开展一个项目,该项目需要我使用 R,但迄今为止我还没有任何经验。我正在尝试将自动 arima 应用于我的数据集中的分区/窗口,但我什至不知道如何开始。3
基本上,我想使用行 c_id = "none" 在每个 partner_id 上训练一个单独的模型,然后预测/预测每个 partner_id 的最大值(日期)。每个合作伙伴的月数/行数长度不同。对于下面粘贴的这个示例数据框,partner_id = "1A9" 有 12 个月/行,c_id = "none" 而 partner_id = "1B9" 有 13 个月/行,c_id = "none"。每个 partner_is 中扩展到 max(Date) 的月数/行数也各不相同。这很棘手,因为我假设我需要为每个 partner_id 动态输入要训练的月数/行数以及要预测的月数/行数。
我在下面包含了一个示例数据集。
x <- data.frame("c_id" = c("none","none","none","none","none",
"none","none","none","none","none","none","none","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-100","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101","c-101", "none","none","none","none","none","none","none","none","none","none","none","none","none","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-110","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111","c-111"), "partner_id" = c("1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1A9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9","1B9"), "rev_month" = as.Date(c("2016-01-01","2016-01-01","2016-02-01","2016-03-01","2016-04-01","2016-05-01","2016-06-01","2016-07-01","2016-08-01", "2016-09-01","2016-10-01","2016-11-01","2016-12-01","2017-01-01","2017-02-01","2017-03-01","2017-04-01","2017-05-01","2017-06-01","2017-07-01","2017-08-01","2017-09-01","2017-10-01","2017-11-01","2017-12-01","2018-01-01","2018-02-01","2018-03-01","2018-04-01","2018-05-01","2018-06-01","2018-07-01","2018-08-01","2018-09-01","2018-10-01","2018-11-01","2018-12-01", "2017-01-01","2017-01-01","2017-02-01","2017-03-01","2017-04-01","2017-05-01","2017-06-01","2017-07-01","2017-08-01", "2017-09-01","2017-10-01","2017-11-01","2017-12-01","2018-01-01","2018-02-01","2018-03-01","2018-04-01","2018-05-01","2018-06-01","2018-07-01","2018-08-01","2018-09-01","2018-10-01","2018-11-01","2018-12-01","2019-01-01","2019-02-01","2019-03-01","2019-04-01","2019-05-01","2019-06-01","2019-07-01","2019-08-01","2019-09-01","2019-10-01","2019-11-01","2019-12-01", "2020-01-01", "2020-02-01", "2020-03-01")), "rev" = c(101.25, 102.25, 103.50, 103.75, 104.15, 104.25, 104.3, 105.00, 105.20, 105.60, 106.00, 106.10, 106.50, 101.50, 100.30, 107.50, 108.30, 108.45, 109.10, 110.10, 112.15, 112.45, 114.65, 115.00, 116.00, 116.50, 117.25, 117.85, 119.25, 119.95, 120.20, 121.50, 122.30, 122.40, 123.25, 123.75, 124.00, 101.25, 102.25, 103.50, 103.75, 104.15, 104.25, 104.3, 105.00, 105.20, 105.60, 106.00, 106.10, 106.50, 101.50, 100.30, 107.50, 108.30, 108.45, 109.10, 110.10, 112.15, 112.45, 114.65, 115.00, 116.00, 116.50, 117.25, 117.85, 119.25, 119.95, 120.20, 121.50, 122.30, 122.40, 123.25, 123.75, 124.00, 124.10, 125.35, 125.45), stingsAsFactors=FALSE)
对于目前还没有任何代码起始代码,我深表歉意,因为我仍然在尝试从概念上考虑这一点,而对 R 没有太多经验。最终,我想将预测和置信区间列添加回我的原始数据框。我愿意接受任何 R 和/或 Python 解决方案。
【问题讨论】: