【问题标题】:Using group_by() from dplyr with predict.lm and do() in a pipe for year linear extrapolation在管道中使用 dplyr 中的 group_by() 和 predict.lm 和 do() 进行年份线性外推
【发布时间】:2018-08-03 17:37:41
【问题描述】:

我想在管道内应用一年的线性外推。我想做的和这个simple example without grouping非常相似。但在管道内并使用dplyr::group_by()。有一些示例like this onethis onethis one。但我无法获得理想的输出。

可重现的例子:

test.frame <- data.frame(Country = 
rep(c("Austria", "Brazil", "Canada"), each = 3, times = 3), 
  Entity = rep(c("CO2","CH4","N2O"), times = 9),
  Year = rep(c(1990:1992), each = 9),
  value = runif(27, 1,5))

test.frame2 <- data.frame(Country = 
rep(c("Austria", "Brazil", "Canada"), each = 3), 
    Entity =  rep(c("CO2","CH4","N2O"), times = 3),
    Year = rep(c(1993), each = 3),
    value = 0)

results_frame <- test.frame %>% 
  dplyr::bind_rows(test.frame2)

我有两个分组类别(国家和实体),我想使用 1990 年到 1992 年的值来使用线性外推法填充 1993 年的值。 在this的基础上,我可以估计线性模型:

linear_model <- test.frame %>%  
dplyr::group_by(Country, Entity) %>% 
lm(value ~ Year, data=.)

results <- predict.lm(linear_model, test.frame2)

但是,results 没有显示理想的输出。所以按照here提出的解决方案,我尝试以下方法:

results_frame <- test.frame %>%
  dplyr::group_by(Country, Entity) %>% 
  do(lm( value ~ Year , data = test.frame)) %>%
  predict.lm(linear_model, test.frame2) %>% 
  bind_rows(test.frame)

但它不起作用,而是我得到了

错误:Results 1, 2, 3, 4, 5, ... 必须是数据帧,而不是 lm

任何帮助将不胜感激!

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    在拟合和预测时必须小心使用正确的数据:

    library(dplyr)
    set.seed(42)
    test.frame <- tibble(Country = rep(c("Austria", "Brazil", "Canada"), each = 3, times = 3), 
                             Entity = rep(c("CO2","CH4","N2O"), times = 9),
                             Year = rep(c(1990:1992), each = 9),
                             value = runif(27, 1,5))
    
    test.frame %>%
      group_by(Country, Entity) %>% 
      do(lm( value ~ Year , data = .) %>% 
           predict(., tibble(Year = 1993)) %>%
           tibble(Year = 1993, value = .)) %>%
      bind_rows(test.frame)
    #> # A tibble: 36 x 4
    #> # Groups:   Country, Entity [9]
    #>    Country Entity  Year value
    #>    <fct>   <fct>  <dbl> <dbl>
    #>  1 Austria CH4     1993 2.10 
    #>  2 Austria CO2     1993 2.03 
    #>  3 Austria N2O     1993 6.02 
    #>  4 Brazil  CH4     1993 4.90 
    #>  5 Brazil  CO2     1993 0.771
    #>  6 Brazil  N2O     1993 5.28 
    #>  7 Canada  CH4     1993 4.69 
    #>  8 Canada  CO2     1993 0.729
    #>  9 Canada  N2O     1993 1.49 
    #> 10 Austria CO2     1990 4.66 
    #> # ... with 26 more rows
    

    【讨论】:

      【解决方案2】:

      您可以使用嵌套的 data.frames 执行以下操作。这种解决方案更通用,因为不需要在预测后重新创建test.frame2,并且可能存在多个自变量:

      library(tidyverse)
      test.frame %>%
        group_by(Country, Entity) %>%
        nest() %>%
        inner_join(test.frame2 %>% select(-value) %>% group_by(Country, Entity) %>% nest(),
                   by = c("Country", "Entity")) %>%
        mutate(model = data.x %>% map(~lm(value ~ Year, data=.)),
               value = map2(model, data.y, predict)) %>%
        select(-data.x, -model) %>%
        unnest() %>%
        bind_rows(test.frame, .)
      

      结果:

         Country Entity Year      value
      1  Austria    CO2 1990  3.6245955
      2  Austria    CH4 1990  3.3857752
      3  Austria    N2O 1990  1.4798741
      4   Brazil    CO2 1990  2.5865668
      5   Brazil    CH4 1990  1.3271481
      6   Brazil    N2O 1990  4.4537926
      7   Canada    CO2 1990  4.7295768
      8   Canada    CH4 1990  4.5255033
      9   Canada    N2O 1990  2.3129381
      10 Austria    CO2 1991  4.8810838
      11 Austria    CH4 1991  4.9950455
      12 Austria    N2O 1991  2.1288504
      13  Brazil    CO2 1991  4.7767443
      14  Brazil    CH4 1991  2.0315449
      15  Brazil    N2O 1991  1.9307966
      16  Canada    CO2 1991  4.6831029
      17  Canada    CH4 1991  2.2761538
      18  Canada    N2O 1991  3.0856428
      19 Austria    CO2 1992  3.1223000
      20 Austria    CH4 1992  4.7715588
      21 Austria    N2O 1992  1.5733608
      22  Brazil    CO2 1992  2.9463442
      23  Brazil    CH4 1992  1.9569259
      24  Brazil    N2O 1992  1.4428006
      25  Canada    CO2 1992  3.0750847
      26  Canada    CH4 1992  1.4635521
      27  Canada    N2O 1992  2.8061861
      28 Austria    CO2 1993  3.3736976
      29 Austria    CH4 1993  5.7699101
      30 Austria    N2O 1993  1.8208485
      31  Brazil    CO2 1993  3.7963291
      32  Brazil    CH4 1993  2.4016508
      33  Brazil    N2O 1993 -0.4018621
      34  Canada    CO2 1993  2.5080960
      35  Canada    CH4 1993 -0.3068815
      36  Canada    N2O 1993  3.2281704
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2019-03-16
        • 2021-11-15
        • 2021-03-17
        • 2020-04-04
        • 2016-01-14
        • 2016-07-15
        • 1970-01-01
        相关资源
        最近更新 更多