在预测原理与实践中寻找干预变量的示例答案

【问题标题】：Looking for an example of intervention variables in Forecasting Principles and Practice在预测原理与实践中寻找干预变量的示例
【发布时间】：2022-01-09 18:56:24
【问题描述】：

我正在阅读《预测原理与实践》一书。具体来说，我正在研究有用的预测器部分，这里是：https://otexts.com/fpp3/useful-predictors.html。

文本中提到了干预变量，但我无法让尖峰或阶跃变量运行。我检查了*，并在网上查看，但没有找到示例。无论我使用尖峰还是阶跃，下面的代码都会返回一个 NULL 模型，任何帮助让干预变量运行都将不胜感激。

library(tidyverse)
library(fpp3)
fit_consBest <- us_change %>%
  model(
    lm = TSLM(Consumption ~ Income + Savings + Unemployment + trend() + season()),
    step = TSLM(formula = Consumption ~ Income + step(object = lm, scope = Income + Savings + Unemployment))
  )
# All of the reporting methods below return NULL models or errors:
report(fit_consBest)
fit_consBest %>% 
  select(step)
glance(fit_consBest)

【问题讨论】：

标签： r time-series forecasting fable-r

【解决方案1】：

step() 函数进行逐步回归，它不会产生步长预测器。

这是一个使用步长预测器的示例。在这种情况下，该步骤发生在 1975 年第一季度（即之前为 0，之后为 1）。

library(fpp3)
#> ── Attaching packages ─────────────────────────────────────── fpp3 0.4.0.9000 ──
#> ✓ tibble      3.1.6          ✓ tsibble     1.1.1     
#> ✓ dplyr       1.0.7          ✓ tsibbledata 0.3.0.9000
#> ✓ tidyr       1.1.4          ✓ feasts      0.2.2.9000
#> ✓ lubridate   1.8.0          ✓ fable       0.3.1.9000
#> ✓ ggplot2     3.3.5
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> x lubridate::date()    masks base::date()
#> x dplyr::filter()      masks stats::filter()
#> x tsibble::intersect() masks base::intersect()
#> x tsibble::interval()  masks lubridate::interval()
#> x dplyr::lag()         masks stats::lag()
#> x tsibble::setdiff()   masks base::setdiff()
#> x tsibble::union()     masks base::union()
fit_consBest <- us_change %>%
  model(
    lm = TSLM(Consumption ~ Income + Savings + Unemployment + trend() + season()),
    step = TSLM(Consumption ~ Income + (year(Quarter) >= 1975))
  )
glance(fit_consBest)
#> # A tibble: 2 × 15
#>   .model r_squared adj_r_squared sigma2 statistic  p_value    df log_lik   AIC
#>   <chr>      <dbl>         <dbl>  <dbl>     <dbl>    <dbl> <int>   <dbl> <dbl>
#> 1 lm         0.776         0.768 0.0944      94.1 2.61e-58     8   -43.2 -457.
#> 2 step       0.148         0.139 0.350       17.0 1.61e- 7     3  -176.  -203.
#> # … with 6 more variables: AICc <dbl>, BIC <dbl>, CV <dbl>, deviance <dbl>,
#> #   df.residual <int>, rank <int>
tidy(fit_consBest)
#> # A tibble: 11 × 6
#>    .model term                      estimate std.error statistic  p.value
#>    <chr>  <chr>                        <dbl>     <dbl>     <dbl>    <dbl>
#>  1 lm     (Intercept)                0.441    0.0650       6.79  1.38e-10
#>  2 lm     Income                     0.741    0.0397      18.7   7.36e-45
#>  3 lm     Savings                   -0.0528   0.00293    -18.0   5.96e-43
#>  4 lm     Unemployment              -0.343    0.0680      -5.04  1.06e- 6
#>  5 lm     trend()                   -0.00113  0.000391    -2.89  4.34e- 3
#>  6 lm     season()year2             -0.0760   0.0617      -1.23  2.19e- 1
#>  7 lm     season()year3             -0.0478   0.0626      -0.763 4.46e- 1
#>  8 lm     season()year4             -0.0865   0.0619      -1.40  1.64e- 1
#>  9 step   (Intercept)                0.485    0.138        3.52  5.45e- 4
#> 10 step   Income                     0.273    0.0469       5.82  2.39e- 8
#> 11 step   year(Quarter) >= 1975TRUE  0.0658   0.140        0.471 6.38e- 1

^{由reprex package (v2.0.1) 于 2021-12-04 创建}

【讨论】：

答案很好，很清楚，非常感谢！