【问题标题】:Calculate average of rows which fulfill certain condition [duplicate]计算满足特定条件的行的平均值[重复]
【发布时间】:2020-09-14 19:02:02
【问题描述】:

我正在使用与此类似的数据框

Ind Pos Sample  Ct      LogConc     RelConc
1   B1  wt1A    26.93   -2.0247878  0.009445223
2   B2  wt1A    27.14   -2.0960951  0.008015026
3   B3  wt1B    26.76   -1.9670628  0.010787907
4   B4  wt1B    26.94   -2.0281834  0.009371662
5   B5  wt1C    26.01   -1.7123939  0.019391264
6   B6  wt1C    26.08   -1.7361630  0.018358492
7   B7  wt1D    25.68   -1.6003396  0.025099232
8   B8  wt1D    25.75   -1.6241087  0.023762457
9   B9  wt1E    22.11   -0.3881154  0.409151879
10  B10 wt1E    22.21   -0.4220713  0.378380453
11  B11 dko1A   22.20   -0.4186757  0.381350463
12  B12 dko1A   22.10   -0.3847199  0.412363423

我的目标是计算 RelConc 的样本平均值,这将产生一个看起来像这样的数据框。

Ind Pos Sample  Ct      LogConc     RelConc     AverageRelConc
1   B1  wt1A    26.93   -2.0247878  0.009445223 0.008730124
2   B2  wt1A    27.14   -2.0960951  0.008015026 0.008730124
3   B3  wt1B    26.76   -1.9670628  0.010787907 0.010079785
4   B4  wt1B    26.94   -2.0281834  0.009371662 0.010079785
5   B5  wt1C    26.01   -1.7123939  0.019391264 0.018874878
6   B6  wt1C    26.08   -1.7361630  0.018358492 0.018874878
7   B7  wt1D    25.68   -1.6003396  0.025099232 0.024430845
8   B8  wt1D    25.75   -1.6241087  0.023762457 0.024430845
9   B9  wt1E    22.11   -0.3881154  0.409151879 0.393766166
10  B10 wt1E    22.21   -0.4220713  0.378380453 0.393766166
11  B11 dko1A   22.20   -0.4186757  0.381350463 0.396856943
12  B12 dko1A   22.10   -0.3847199  0.412363423 0.396856943

我对 R 相当陌生,不知道如何完成这样一个看似简单的任务。在 python 中,我可能会遍历每一行并检查是否遇到了新的样本名称,然后计算上述所有样本的平均值。然而,这似乎不是很“R like”。 如果有人能指出我的解决方案,我会很高兴!

干杯!

【问题讨论】:

    标签: r dataframe matrix average


    【解决方案1】:

    base R,我们可以使用ave,速度非常快

    df1$AverageRelConc <- with(df1, ave(RelConc, Sample))
    

    -输出

    df1$AverageRelConc
    #[1] 0.008730125 0.008730125 0.010079784 0.010079784 0.018874878 0.018874878 0.024430844 0.024430844 0.393766166 0.393766166
    #[11] 0.396856943 0.396856943
    

    或者使用tidyverse,我们按'Sample'分组,得到'RelConc'的mean

    library(dplyr)
    df1 %>%
      group_by(Sample) %>%
      mutate(AverageRelConc = mean(RelConc, na.rm = TRUE))
    

    -输出

    # A tibble: 12 x 7
    # Groups:   Sample [6]
    #     Ind Pos   Sample    Ct LogConc RelConc AverageRelConc
    #   <int> <chr> <chr>  <dbl>   <dbl>   <dbl>          <dbl>
    # 1     1 B1    wt1A    26.9  -2.02  0.00945        0.00873
    # 2     2 B2    wt1A    27.1  -2.10  0.00802        0.00873
    # 3     3 B3    wt1B    26.8  -1.97  0.0108         0.0101 
    # 4     4 B4    wt1B    26.9  -2.03  0.00937        0.0101 
    # 5     5 B5    wt1C    26.0  -1.71  0.0194         0.0189 
    # 6     6 B6    wt1C    26.1  -1.74  0.0184         0.0189 
    # 7     7 B7    wt1D    25.7  -1.60  0.0251         0.0244 
    # 8     8 B8    wt1D    25.8  -1.62  0.0238         0.0244 
    # 9     9 B9    wt1E    22.1  -0.388 0.409          0.394  
    #10    10 B10   wt1E    22.2  -0.422 0.378          0.394  
    #11    11 B11   dko1A   22.2  -0.419 0.381          0.397  
    #12    12 B12   dko1A   22.1  -0.385 0.412          0.397  
    

    数据

    df1 <- structure(list(Ind = 1:12, Pos = c("B1", "B2", "B3", "B4", "B5", 
    "B6", "B7", "B8", "B9", "B10", "B11", "B12"), Sample = c("wt1A", 
    "wt1A", "wt1B", "wt1B", "wt1C", "wt1C", "wt1D", "wt1D", "wt1E", 
    "wt1E", "dko1A", "dko1A"), Ct = c(26.93, 27.14, 26.76, 26.94, 
    26.01, 26.08, 25.68, 25.75, 22.11, 22.21, 22.2, 22.1), LogConc = c(-2.0247878, 
    -2.0960951, -1.9670628, -2.0281834, -1.7123939, -1.736163, -1.6003396, 
    -1.6241087, -0.3881154, -0.4220713, -0.4186757, -0.3847199), 
        RelConc = c(0.009445223, 0.008015026, 0.010787907, 0.009371662, 
        0.019391264, 0.018358492, 0.025099232, 0.023762457, 0.409151879, 
        0.378380453, 0.381350463, 0.412363423)), class = "data.frame",
        row.names = c(NA, 
    -12L))
    

    【讨论】:

      【解决方案2】:

      试试这个tidyverse 选项:

      library(tidyverse)
      #Code
      df %>% group_by(Sample) %>%
        mutate(AvgRelConc=mean(RelConc,na.rm=T))
      

      输出:

      # A tibble: 12 x 7
      # Groups:   Sample [6]
           Ind Pos   Sample    Ct LogConc RelConc AvgRelConc
         <int> <chr> <chr>  <dbl>   <dbl>   <dbl>      <dbl>
       1     1 B1    wt1A    26.9  -2.02  0.00945    0.00873
       2     2 B2    wt1A    27.1  -2.10  0.00802    0.00873
       3     3 B3    wt1B    26.8  -1.97  0.0108     0.0101 
       4     4 B4    wt1B    26.9  -2.03  0.00937    0.0101 
       5     5 B5    wt1C    26.0  -1.71  0.0194     0.0189 
       6     6 B6    wt1C    26.1  -1.74  0.0184     0.0189 
       7     7 B7    wt1D    25.7  -1.60  0.0251     0.0244 
       8     8 B8    wt1D    25.8  -1.62  0.0238     0.0244 
       9     9 B9    wt1E    22.1  -0.388 0.409      0.394  
      10    10 B10   wt1E    22.2  -0.422 0.378      0.394  
      11    11 B11   dko1A   22.2  -0.419 0.381      0.397  
      12    12 B12   dko1A   22.1  -0.385 0.412      0.397  
      

      使用的一些数据:

      #Data
      df <- structure(list(Ind = 1:12, Pos = c("B1", "B2", "B3", "B4", "B5", 
      "B6", "B7", "B8", "B9", "B10", "B11", "B12"), Sample = c("wt1A", 
      "wt1A", "wt1B", "wt1B", "wt1C", "wt1C", "wt1D", "wt1D", "wt1E", 
      "wt1E", "dko1A", "dko1A"), Ct = c(26.93, 27.14, 26.76, 26.94, 
      26.01, 26.08, 25.68, 25.75, 22.11, 22.21, 22.2, 22.1), LogConc = c(-2.0247878, 
      -2.0960951, -1.9670628, -2.0281834, -1.7123939, -1.736163, -1.6003396, 
      -1.6241087, -0.3881154, -0.4220713, -0.4186757, -0.3847199), 
          RelConc = c(0.009445223, 0.008015026, 0.010787907, 0.009371662, 
          0.019391264, 0.018358492, 0.025099232, 0.023762457, 0.409151879, 
          0.378380453, 0.381350463, 0.412363423)), class = "data.frame", row.names = c(NA, 
      -12L))
      

      或者您可以使用aggregate() 并将结果保存在不同的数据框中,然后您可以加入原始df

      #Compute means
      dfmeans <- aggregate(RelConc~Sample,df,mean,na.rm=T)
      #Now match
      df$AvgRelConc <- dfmeans[match(df$Sample,dfmeans$Sample),"RelConc"]
      

      输出:

         Ind Pos Sample    Ct    LogConc     RelConc  AvgRelConc
      1    1  B1   wt1A 26.93 -2.0247878 0.009445223 0.008730125
      2    2  B2   wt1A 27.14 -2.0960951 0.008015026 0.008730125
      3    3  B3   wt1B 26.76 -1.9670628 0.010787907 0.010079784
      4    4  B4   wt1B 26.94 -2.0281834 0.009371662 0.010079784
      5    5  B5   wt1C 26.01 -1.7123939 0.019391264 0.018874878
      6    6  B6   wt1C 26.08 -1.7361630 0.018358492 0.018874878
      7    7  B7   wt1D 25.68 -1.6003396 0.025099232 0.024430844
      8    8  B8   wt1D 25.75 -1.6241087 0.023762457 0.024430844
      9    9  B9   wt1E 22.11 -0.3881154 0.409151879 0.393766166
      10  10 B10   wt1E 22.21 -0.4220713 0.378380453 0.393766166
      11  11 B11  dko1A 22.20 -0.4186757 0.381350463 0.396856943
      12  12 B12  dko1A 22.10 -0.3847199 0.412363423 0.396856943
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2022-11-24
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2022-06-22
        • 1970-01-01
        • 2021-12-23
        • 2016-05-09
        相关资源
        最近更新 更多