【问题标题】:Pairwise analysis of two groups transpoing dataframe using pivot_wider使用 pivot_wider 对两组转置数据帧进行成对分析
【发布时间】:2020-05-26 19:04:37
【问题描述】:

我有以下数据框

a <- 
structure(list(Sample_1 = structure(c(Bacteria_A = 1L, Bacteria_B = 2L, 
Bacteria_C = 3L, `4` = 1L, `5` = 2L, `6` = 2L, `7` = 3L, `8` = 1L
), .Label = c("12", "23", "25", "soil"), class = "factor"), Sample_2 = structure(c(Bacteria_A = 3L, 
Bacteria_B = 2L, Bacteria_C = 1L, `4` = 3L, `5` = 2L, `6` = 2L, 
`7` = 1L, `8` = 3L), .Label = c("10", "12", "23", "soil"), class = "factor"), 
    Sample_3 = structure(c(Bacteria_A = 2L, Bacteria_B = 1L, 
    Bacteria_C = 3L, `4` = 2L, `5` = 1L, `6` = 1L, `7` = 3L, 
    `8` = 2L), .Label = c("33", "45", "50", "soil"), class = "factor"), 
    Sample_4 = structure(c(Bacteria_A = 1L, Bacteria_B = 3L, 
    Bacteria_C = 2L, `4` = 1L, `5` = 3L, `6` = 3L, `7` = 2L, 
    `8` = 1L), .Label = c("32", "38", "44", "soil"), class = "factor"), 
    Sample_5 = structure(c(Bacteria_A = 2L, Bacteria_B = 3L, 
    Bacteria_C = 1L, `4` = 2L, `5` = 3L, `6` = 3L, `7` = 1L, 
    `8` = 2L), .Label = c(" 3", "34", "55", "soil"), class = "factor"), 
    Sample_6 = structure(c(Bacteria_A = 1L, Bacteria_B = 2L, 
    Bacteria_C = 3L, `4` = 1L, `5` = 2L, `6` = 2L, `7` = 3L, 
    `8` = 1L), .Label = c(" 0", " 3", "34", "soil"), class = "factor"), 
    Genus = c("Bacteria_A", "Bacteria_B", "Bacteria_C", "Bacteria_A", 
    "Bacteria_B", "Bacteria_B", "Bacteria_C", "Bacteria_A"), 
    Group = c("Soil", "Soil", "Soil", "Water", "Water", "Water", 
    "Water", "Water")), row.names = c(NA, 8L), class = "data.frame")


> a
  Sample_1 Sample_2 Sample_3 Sample_4 Sample_5 Sample_6      Genus Group
1       12       23       45       32       34        0 Bacteria_A  Soil
2       23       12       33       44       55        3 Bacteria_B  Soil
3       25       10       50       38        3       34 Bacteria_C  Soil
4       12       23       45       32       34        0 Bacteria_A Water
5       23       12       33       44       55        3 Bacteria_B Water
6       23       12       33       44       55        3 Bacteria_B Water
7       25       10       50       38        3       34 Bacteria_C Water
8       12       23       45       32       34        0 Bacteria_A Water

我想比较土壤与水中每种细菌的处理效果。例如 wilcox.test 土壤与水中的 BActeria_A。我该怎么做??

到目前为止,我已经尝试扩大数据框的范围以将细菌作为列名

 nms <- colnames(a)[1:(ncol(a)-2)]
> nms
[1] "Sample_1" "Sample_2" "Sample_3" "Sample_4" "Sample_5" "Sample_6"


    d <- a %>% 
      pivot_wider(names_from = Genus, values_from=nms )
       group_by(name) %>% 
      summarise(mean_Soil = mean(value[Group == "Soil"]), 
                mean_Water= mean(value[Group == "Water"]), 
                pvalue = wilcox.test(value ~ Group)$p.value) 


    Error in group_by(name) : object 'name' not found

预期的输出看起来像这样(本例中的假值)。这只是为了说明所需的输出。

#> # A tibble: 3 x 4
#>   name       mean_soil mean_water pvalue
#>   <chr>          <dbl>      <dbl>  <dbl>
#> 1 Bacteria_A      24.3       24    0.936
#> 2 Bacteria_B      28.3       29    0.873
#> 3 Bacteria_C      26.7       23.8  0.748

【问题讨论】:

    标签: r dplyr tidyr


    【解决方案1】:

    您需要使用pivot_longer 而不是pivot_wider,因为summarise 适用于列。然后将所有值转换为数字(它们是您示例中的因素):

    a_longer = 
      a %>%
      pivot_longer(c(-Genus,-Group)) %>% 
      mutate(value = as.numeric(as.character(value)))
    

    从这里我建议将summarise 分成两部分,因为您实际上对meanwilcox.test 使用了两个不同的分组,然后您可以将这些表连接在一起:

    full_join(
      a_longer %>% 
        group_by(Genus, Group) %>% 
        summarise(mean = mean(value)) %>% 
        pivot_wider(names_from = Group, names_prefix = "mean_", values_from = mean)
      ,
      a_longer %>% 
        group_by(Genus) %>% 
        summarise(pvalue = wilcox.test(value ~ Group)$p.value)
    )
    

    【讨论】:

    • 在上面的例子中效果很好,非常感谢。但是,在我的原始数据集中,我得到一列均值的 NA 值。返回每个组的平均值的完整连接的第一个块返回一个组的正确值,但另一个组返回 NA 值?有什么想法吗??
    • 抱歉,NA 值适用于仅在一组中发现但在另一组中没有发现的细菌
    • 当一些平均值为 NA 因为缺少细菌时,有没有办法仍然运行 wilcox.test ? @tspano
    • 如果你使用full_joinwilcox.test的结果将全部显示出来,在mean的计算中独立于NAs
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-08-20
    相关资源
    最近更新 更多