【问题标题】:Phyloseq, how obtain the relative Abundance by merge_samples?Phyloseq,如何通过 merge_samples 获得相对丰度?
【发布时间】:2019-11-27 05:29:37
【问题描述】:

我正在尝试使用 Phyloseq 包的 merge_sample 选项获取相对丰度。

当我用所有样本计算每个 Phylum 的平均值时(我将以 GlobalPatterns 为例);我的意思是,Globalpaters 有 26 个样本,所以我做了类似的东西

library(phyloseq)
library(plyr)
data(GlobalPatterns)
TGroup <- tax_glom(GlobalPatterns, taxrank = "Phylum")
PGroup <- transform_sample_counts(TGroup, function(x)100* x / sum(x))
OTUg <- otu_table(PGroup)
TAXg <- tax_table(PGroup)[,"Phylum"]
AverageD <- as.data.frame(rowMeans(OTUg))
names(AverageD) <- c("Mean")
GTable <- merge(TAXg, AverageD, by=0, all=TRUE)
GTable$Row.names = NULL
GTable <- GTable[order(desc(GTable$Mean)),]
head(GTable)

我得到类似的东西:

        Phylum           Mean

1 Proteobacteria      29.45550
2 Firmicutes          18.87905
3 Bacteroidetes       17.34374
4 Cyanobacteria       13.70639
5 Actinobacteria      8.93446
6....... More.....

我觉得还可以!!!!

但是当我托盘到法师 merge_samples(by: SampleType):

    ps <- tax_glom(GlobalPatterns, "Phylum")
    ps0 <- transform_sample_counts(ps, function(x)100* x / sum(x))
    ps1 <- merge_samples(ps0, "SampleType")
    ps2 <- transform_sample_counts(ps1, function(x)100* x / sum(x))
    ps3 <- ps2
    otu_table(ps3) <- t(otu_table(ps3)) # transpose the matrix otus !!!
    OTUg <- otu_table(ps3)
    TAXg <- tax_table(ps3)[,"Phylum"]
    GTable <- merge(TAXg, OTUg, by=0, all=TRUE)
    GTable$Row.names = NULL
    GTable$Mean=rowMeans(GTable[,-c(1)], na.rm=TRUE)
    GTable <- GTable[order(desc(GTable$Mean)),]
   head(GTable)

我得到相同的税,但在平均值列中的百分比不同:

  Phylum Feces Freshwater Freshwater Mock Ocean Sediment Skin Soil Tongue Mean
1 Proteobacteria  1.58 16.71 18.61 20.10 38.00 71.03 31.98 32.66 44.49 30.57
2 Firmicutes 54.82 0.12 0.65 41.42 0.08 2.53 30.67 0.64 21.67 16.96
3 Bacteroidetes 35.23 11.92 5.07 24.97 31.17 7.01 9.09 9.90 12.28 16.29
4 Cyanobacteria 2.63 30.17 62.57 0.16 19.18 3.24 4.65 0.97 6.61 14.46
5 Actinobacteria 3.47 37.11 1.74 8.39 5.12 1.04 16.78 9.99 7.49 10.13

此时,使用 SampleType 的 merge_samples,每一列(样本)都会使分类群变得模糊,并且每个样本中每个门的百分比都会发生变化(粪便淡水淡水......),我理解这一点,但总体平均值即使我合并样本,每个门都必须相同,在这种情况下,平均值不同(Proteobacteria 30.57,Firmicutes 16.9,Bacteroidetes 16.29........)。

任何解决方案或建议????

谢谢

【问题讨论】:

    标签: r bioinformatics phyloseq


    【解决方案1】:

    在第一部分中,您将获取所有样本的均值。在第二个中,您正在采用分组均值的平均值。仅当每组的观察次数相同时,这两者才等效。

    例如:

    # equal n for each group
    abundance = seq(0.1,0.6,by=0.1)
    group = rep(letters[1:3],each=2)
    mean(tapply(abundance,group,mean)) == mean(abundance)
    [1] TRUE
    
    # unequal n
    abundance = seq(0.1,0.6,by=0.1)
    group = rep(letters[1:3],1:3)
    mean(tapply(abundance,group,mean)) == mean(abundance)
    [1] FALSE
    

    每个 SampleType 的 n 不同

    TGroup <- tax_glom(GlobalPatterns, taxrank = "Phylum")
    PGroup <- transform_sample_counts(TGroup, function(x)100* x / sum(x))
    SampleType = sample_data(PGroup)$SampleType
    table(SampleType)
    
    SampleType
                 Feces         Freshwater Freshwater (creek)               Mock 
                     4                  2                  3                  3 
                 Ocean Sediment (estuary)               Skin               Soil 
                     3                  3                  3                  3 
                Tongue 
                     2 
    

    要获得相同的样本平均丰度,您需要找到每个 SampleType 的平均丰度,然后求平均值:

    mean_PGroup = sapply(levels(SampleType),function(i){
      rowMeans(otu_table(PGroup)[,SampleType==i])
    })
    
    phy = tax_table(PGroup)[rownames(mean_PGroup ),"Phylum"]
    rownames(mean_PGroup) = phy
    head(sort(rowMeans(mean_PGroup),decreasing=TRUE))
    
     Proteobacteria      Firmicutes   Bacteroidetes   Cyanobacteria  Actinobacteria 
          30.572773       16.956254       16.293286       14.463643       10.126875 
    Verrucomicrobia 
           2.774216 
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-07-09
      • 1970-01-01
      • 2022-08-12
      • 2022-11-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-06-17
      相关资源
      最近更新 更多