【问题标题】:Linear regression and model comparison with grouped data线性回归和模型与分组数据的比较
【发布时间】:2021-07-19 16:11:39
【问题描述】:

我正在测试可分组数据的不同线性回归模型(grp1grp2)。

library(rcompanion)
library(dplyr)
library(tidyr)
  
#some sample data
sampleDF <- data.frame(grp1=append(rep("foo",6),rep("bar",6)), 
                       grp2=append(rep("A",3),rep("B",3)), 
                       x=seq(3)-rnorm(3), 
                       y=seq(3)*3+1)

#some models
model1 <- y ~ x
model2 <- y ~ x - 1

#do the fitting
sampleDF %>%
  group_by(grp1, grp2) %>% 
  do(model1 = lm(model1, data = .),
     model2 = lm(model2, data = .)) %>% 
  ungroup() -> fittedModels

这会产生:

 fittedModels
# A tibble: 4 x 4
  grp1  grp2  model1 model2
* <fct> <fct> <list> <list>
1 bar   A     <lm>   <lm>  
2 bar   B     <lm>   <lm>  
3 foo   A     <lm>   <lm>  
4 foo   B     <lm>   <lm>  

使用broom 可以很好地将模型收集到数据帧中,例如:

fittedModels %>% rowwise() %>% broom::tidy(model2)

# A tibble: 4 x 7
# Groups:   grp1, grp2 [4]
  grp1  grp2  term  estimate std.error statistic p.value
  <fct> <fct> <chr>    <dbl>     <dbl>     <dbl>   <dbl>
1 bar   A     x         3.43     0.124      27.7 0.00130
2 bar   B     x         3.43     0.124      27.7 0.00130
3 foo   A     x         3.43     0.124      27.7 0.00130
4 foo   B     x         3.43     0.124      27.7 0.00130

我想比较使用不同方法的模型。库rcompanion 提供了方法compareLM()pairwiseModelAnova()

尝试:

fittedModels %>% summarise(compareLM(model1, model2))
> Error in formula.default(fits[[i]]) : invalid formula

rowwise(fittedModels) %>% summarise(compareLM(model1, model2))
> Error: Column `compareLM(model1, model2)` must be length 1 (a summary value), not 2

仔细检查发现:

fittedModels$model1[[1]]

Call:
lm(formula = y ~ x, data = .)

Coefficients:
(Intercept)            x  
          0            3  

m.1 <- fittedModels$model1[1]
m.2 <- fittedModels$model2[1]
compareLM(m.1,m.2)
> Error in formula.default(fits[[i]]) : invalid formula

这可能表明管道存在问题?!

有没有办法解决这个问题?

【问题讨论】:

    标签: r dataframe dplyr regression tidyr


    【解决方案1】:

    我发现您的数据有错误,但是当将随机数添加到 sampleDF$x 时它会起作用。

    library(rcompanion)
    library(dplyr)
    library(tidyr)
    library(broom)
    library(purrr)
    
    #some sample data
    set.seed(1)
    sampleDF <- data.frame(grp1=append(rep("foo",6),rep("bar",6)), 
                           grp2=append(rep("A",3),rep("B",3)), 
                           x=seq(3) - rnorm(3), 
                           y=seq(3)*3+1)
    
    #some models
    model1 <- y ~ x
    model2 <- y ~ x - 1
    
    # Use purrr
    sampleDF %>%
      group_by(grp1, grp2) %>%
      nest() %>%
      mutate(model1 = purrr::map(data, ~lm(model1, data = .))) %>%
      mutate(model2 = purrr::map(data, ~lm(model2, data = .))) %>%
      mutate(tidy_model1 = purrr::map(model1, broom::tidy)) %>%
      mutate(tidy_model2 = purrr::map(model2, broom::tidy)) %>%
      mutate(compare_m1m2 = purrr::map2(model1, model2, ~compareLM(.x, .y)))
    
    > # A tibble: 4 x 8
    > # Groups:   grp1, grp2 [4]
    > grp1  grp2  data             model1 model2 tidy_model1      tidy_model2      > compare_m1m2    
    > <chr> <chr> <list>           <list> <list> <list>           <list>           > <list>          
    > 1 foo   A     <tibble [3 × 2]> <lm>   <lm>   <tibble [2 × 5]> <tibble [1 × 5]> <named list [2]>
    > 2 foo   B     <tibble [3 × 2]> <lm>   <lm>   <tibble [2 × 5]> <tibble [1 × 5]> <named list [2]>
    > 3 bar   A     <tibble [3 × 2]> <lm>   <lm>   <tibble [2 × 5]> <tibble [1 × 5]> <named list [2]>
    > 4 bar   B     <tibble [3 × 2]> <lm>   <lm>   <tibble [2 × 5]> <tibble [1 × 5]> <named list [2]>
    

    【讨论】:

    • 很好地使用purrr,这里的问题只是不能取消嵌套()model1model2列。
    • 是的,model1 和 model2 列不能合并到一个数据框。当然你可以选择只取消嵌套其他列
    猜你喜欢
    • 2012-10-28
    • 2020-11-01
    • 1970-01-01
    • 2016-02-27
    • 2019-10-15
    • 2018-08-10
    • 2018-09-03
    • 1970-01-01
    • 2019-03-27
    相关资源
    最近更新 更多