【问题标题】:Summarise multiple columns using weighted t-test使用加权 t 检验汇总多列
【发布时间】:2021-04-18 21:02:09
【问题描述】:

我有以下数据并想计算加权 p 值。我查看了dplyr summarise multiple columns using t.test。但我的版本应该使用重量。我可以使用 Code2 来做到这一点。但是有超过30列。如何有效地计算加权 p 值?

代码 1

# A tibble: 877 x 5
   cat     population farms farmland weight
   <chr>        <dbl> <dbl>    <dbl>  <dbl>
 1 Treated       9.89  8.00     12.3  1    
 2 Control      10.3   7.81     12.1  0.714
 3 Control      10.2   8.04     12.4  0.156
 4 Control      10.3   7.97     12.1  0.340
 5 Control      10.9   8.87     12.7  2.85 
 6 Control      10.4   8.35     12.5  0.934
 7 Control      10.5   8.58     12.9  0.193
 8 Control      10.6   8.57     12.6  0.276
 9 Control      10.2   8.54     12.5  0.344
10 Control      10.5   8.76     12.6  0.625
# … with 867 more rows

代码 2

wtd.t.test(
  x = df$population[df$cat == "Treated"],
  y = df$population[df$cat == "Control"],
  weight = df$weight[df$cat == "Treated"],
  weighty = df$weight[df$cat == "Control"])$coefficients[3]

【问题讨论】:

    标签: r statistics tidyverse


    【解决方案1】:

    我们可以使用summariseacross

    library(dplyr)
    df %>%
       summarise(across(c(population:farmland),
       ~ weights::wtd.t.test(x = .[cat == 'Treated'],
                             y = .[cat == 'Control'], 
                             weight = weight[cat == 'Treated'],
                             weighty= weight[cat == 'Control'])$coefficients[3]))
    

    或者使用lapply/sapply

    sapply(df[2:4], function(v)
             weights::wtd.t.test(x = v[df$cat == "Treated"],
                                 y = v[df$cat == "Control"],
                                 weight = df$weight[df$cat == "Treated"],
                       weighty = df$weight[df$cat == "Control"])$coefficients[3])
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-08-16
      • 1970-01-01
      • 2021-12-28
      • 2021-04-28
      • 2013-05-07
      • 2017-12-20
      • 1970-01-01
      相关资源
      最近更新 更多