【问题标题】:Split a dataframe by groups of values in R按 R 中的值组拆分数据框
【发布时间】:2021-11-23 08:41:47
【问题描述】:

我有一个类似于下面的数据集

data1 <- data.frame(Symbol=c("APEX1","APOC3","CCNA2","CDC42","CDK1","BRCA2","BSCL2","BUB1B","EEF2","EFEMP1","EGF","ATP5O","ATR"), Total_read=c(32546,32426,31854,31745,25879,25465,24759,24574,8769,8458,2546,875,850))

我正在寻找一种简洁的方法,通过对值进行分组(彼此相差 10% 以内)来将此数据帧拆分为子集(最好在列表中)。因此,上述数据集将分为 5 个子集,如下所示:

[1]
Symbol Total_read
APEX1      32546
APOC3      32426
CCNA2      31854
CDC42      31745

[2]
Symbol Total_read
CDK1       25879
BRCA2      25465
BSCL2      24759
BUB1B      24574

[3]
Symbol Total_read
EEF2       8769
EFEMP1     8458

[4]
Symbol Total_read
EGF        2546

[5]
Symbol Total_read
ATP5O      875
ATR        850

感谢您的任何建议。

【问题讨论】:

    标签: r dataframe split tidyverse


    【解决方案1】:
    library(dplyr)
    
    var10 <- function(x){
      n <- length(x)
      
      g <- 1
      
      out <- numeric(n)
      out[1] <- g
      
      for(i in 2:n){
        diff <- abs(100*(x[i-1] - x[i])/x[i-1])
        
        if(diff > 10){
          g <- g + 1
        }
        out[i] <- g
        
      }
      
      return(out)
      
    }
    
    
    data1 %>% 
      mutate(aux = var10(Total_read)) %>% 
      group_split(aux)
    
    
    [[1]]
    # A tibble: 4 x 3
      Symbol Total_read   aux
      <chr>       <dbl> <dbl>
    1 APEX1       32546     1
    2 APOC3       32426     1
    3 CCNA2       31854     1
    4 CDC42       31745     1
    
    [[2]]
    # A tibble: 4 x 3
      Symbol Total_read   aux
      <chr>       <dbl> <dbl>
    1 CDK1        25879     2
    2 BRCA2       25465     2
    3 BSCL2       24759     2
    4 BUB1B       24574     2
    
    [[3]]
    # A tibble: 2 x 3
      Symbol Total_read   aux
      <chr>       <dbl> <dbl>
    1 EEF2         8769     3
    2 EFEMP1       8458     3
    
    [[4]]
    # A tibble: 1 x 3
      Symbol Total_read   aux
      <chr>       <dbl> <dbl>
    1 EGF          2546     4
    
    [[5]]
    # A tibble: 2 x 3
      Symbol Total_read   aux
      <chr>       <dbl> <dbl>
    1 ATP5O         875     5
    2 ATR           850     5
    

    【讨论】:

      猜你喜欢
      • 2016-06-27
      • 2021-09-01
      • 1970-01-01
      • 1970-01-01
      • 2021-09-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多