【问题标题】:Ordering a list of dataframes by minimum column value in R按 R 中的最小列值对数据框列表进行排序
【发布时间】:2019-10-14 22:54:15
【问题描述】:

作为创建多元逻辑回归的初步准备,我正在进行单变量回归,并希望选择 p glm 并获得模型的输出,但我很难按 p 值的等级对它们进行排序。

这是我目前所拥有的:

predictor1 <- c(0,1.1,2.4,3.1,4.0,5.9,4.2,3.3,2.2,1.1)
predictor2 <- as.factor(c("yes","no","no","yes","yes","no","no","yes","no","no"))
predictor3 <- as.factor(c("a", "b", "c", "c", "a", "c", "a", "a", "a", "c"))
outcome <- as.factor(c("alive","dead","alive","dead","alive","dead","alive","dead","alive","dead"))
df <- data.frame(pred1 = predictor1, pred2 = predictor2, pred3 = predictor3, outcome = outcome)
predictors <- c("pred1", "pred2", "pred3")
df %>%
    select(predictors) %>%
    map(~ glm(df$outcome ~ .x, data = df, family = "binomial"))  %>%
    #Extract odds ratio, confidence interval lower and upper bounds, and p value
    map(function (x, y) data.frame(OR = exp(coef(x)), 
        lower=exp(confint(x)[,1]), 
        upper=exp(confint(x)[,2]),
        Pval = coef(summary(x))[,4]))

这段代码吐出了每个模型的摘要

$pred1
                OR   lower          upper           Pval
    (Intercept) 0.711082 0.04841674 8.521697    0.7818212
    .x          1.133085 0.52179227 2.653040    0.7465663
$pred2
                OR   lower          upper           Pval
    (Intercept) 1   0.18507173  5.40331     1
    .xyes   1   0.07220425  13.84960    1
$pred3
                OR   lower          upper           Pval
    (Intercept) 0.25    0.0127798   1.689944    0.2149978
    .xb         170179249.43 0.0000000  NA  0.9961777
    .xc         12.00   0.6908931   542.678010  0.1220957 

但是对于我的真实数据集,有几十个预测变量,所以我需要一种对输出进行排序的方法。最好通过每个模型中的最小(非截距)p 值。也许我为每个模型的摘要选择的数据结构并不是最好的,所以任何关于如何在更灵活的数据结构中获取相同信息的建议也很好。

【问题讨论】:

    标签: r glm


    【解决方案1】:

    使用map_dfr 而不是map,使用拦截过滤行,然后执行arrange。使用来自broomtidy 而不是您的自定义函数。

    library(broom)    
    df %>%
       select(predictors) %>%
       map(~ glm(df$outcome ~ .x, data = df, family = "binomial")) %>%
       map_dfr(tidy, .id='Model') %>% 
       filter(term!="(Intercept)") %>% arrange(p.value)
    
    # A tibble: 4 x 6
    Model term   estimate std.error statistic p.value
    <chr> <chr>     <dbl>     <dbl>     <dbl>   <dbl>
    1 pred3 .xc    2.48e+ 0     1.61   1.55e+ 0   0.122
    2 pred1 .x     1.25e- 1     0.387  3.23e- 1   0.747
    3 pred3 .xb    1.90e+ 1  3956.     4.79e- 3   0.996
    4 pred2 .xyes -5.73e-16     1.29  -4.44e-16   1.000
    

    【讨论】:

    • tidy 函数正是我想要的。它以一种方便的格式提供了我需要的信息,其中包括模型和术语。谢谢。
    • @pgcudahy 不客气,看看broom,它还有很多其他有用的功能,例如glance
    【解决方案2】:

    您可以只使用do.call(rbind) 方法,然后按 p 值排序。 [-1, ] 省略了拦截。

    pl <- do.call(rbind, sapply(predictors, function(x) {
      fo <- reformulate(x, response="outcome")
      summary(glm(fo, data=df, family="binomial"))$coef[-1, ]
      }))
    pl[order(pl[, 4]), ]
    #             Estimate   Std. Error       z value  Pr(>|z|)
    # pred3c  2.484907e+00    1.6072751  1.546037e+00 0.1220957
    # pred1   1.249440e-01    0.3866195  3.231703e-01 0.7465663
    # pred3b  1.895236e+01 3956.1804861  4.790571e-03 0.9961777
    # pred2  -5.733167e-16    1.2909944 -4.440892e-16 1.0000000
    

    数据

    df <- structure(list(pred1 = c(0, 1.1, 2.4, 3.1, 4, 5.9, 4.2, 3.3, 
    2.2, 1.1), pred2 = structure(c(2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 
    1L, 1L), .Label = c("no", "yes"), class = "factor"), pred3 = structure(c(1L, 
    2L, 3L, 3L, 1L, 3L, 1L, 1L, 1L, 3L), .Label = c("a", "b", "c"
    ), class = "factor"), outcome = structure(c(1L, 2L, 1L, 2L, 1L, 
    2L, 1L, 2L, 1L, 2L), .Label = c("alive", "dead"), class = "factor")), class = "data.frame", row.names = c(NA, 
    -10L))
    
    predictors <- c("pred1", "pred2", "pred3")
    

    【讨论】:

      猜你喜欢
      • 2021-10-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-10-09
      • 2021-06-06
      • 2021-03-27
      • 2014-06-02
      相关资源
      最近更新 更多