【问题标题】:Subsetting a data frame using another data frame使用另一个数据框对数据框进行子集化
【发布时间】:2018-03-20 14:21:20
【问题描述】:

我遇到了一些不应该那么难解决的问题。我想做的是通过使用另一个data.frame 来对data.frame 进行子集化,更准确地说,通过使用某个参数。 示例如下:

df1<- t(data.frame(A=c("ABI", "BET", "ALN", "SPH", "PTI", "DIC", "PTD"), B=c("0.5","3","0","0","5","0","15"), C=c("0","0","3","15","15","0","0"), D=c("0.5","0.5","0.5","0","0","0","0"), E=c("37.5","37.5","0.5","62.5","0.5","0.5","1")))
df2<- data.frame(A=c("ABI", "BET", "ALN", "SPH", "PTI", "DIC", "PTD"), B=c("vasc", "vasc","vasc","spha", "moss","moss","moss"), C=c("a", "a", "b", "a", "c","d","a"))

现在,假设我希望在我的df1 中只有我的 df1 中的 df2 中的“vasc”对象 A(这里它们是物种)。 为此,我尝试了一些方法,例如:

df3 <- subset(df2, B=="vasc")
df4 <- df1[,c(df1, as.vector(df2))]

但是这样做,我有一个类型错误:

df1[, c(df1, as.vector(df2))] 中的错误:下标类型“列表”无效

因此,我尝试取消列出我的数据框,但似乎没有任何效果。我在这个问题上已经有一段时间了,我确实探索了论坛,看看是否有人对我的问题有一个优雅的解决方案,但看起来没有。 执行此子集的另一种方法是执行以下代码,但即使我觉得更接近解决方案,它也不起作用:

 try11 <- list(df2, df1)%>% rbindlist(., fill=T)  # with df1 not transposed
 df11 <- try11[try11=="vasc",]

我希望代码足够好,我的解释足够清楚。 谢谢!

【问题讨论】:

    标签: r dataframe subset


    【解决方案1】:

    你可以试试:

    library(data.table)
    setDT(df1)
    setDT(df2)
    
    dtPruned <- df1[A %in% df2[B == "vasc", A]]
    

    但是,请确保删除 df1 定义中的 t() 调用以使其正常工作。基本上,它所做的是选择 df2 中的 A 列,其中 B =“vasc”。然后它从 df1 中选择 A 在 df2 中的那些 A 中的行。

    【讨论】:

      【解决方案2】:

      你可以用dplyr做到这一点

      library(dplyr)
      species <- as.character(df2[df2$B == "vasc",1])
      
      df1 %>% 
          slice(A %in% species)
      
      ## A tibble: 3 x 5
      #  A     B     C     D     E
      #  <fct> <fct> <fct> <fct> <fct>
      #1 ABI   0.5   0     0.5   37.5
      #2 ABI   0.5   0     0.5   37.5
      #3 ABI   0.5   0     0.5   37.5
      

      PS

      您的数据仅包含factor。也许您希望将数字用作numeric 类。

      【讨论】:

        【解决方案3】:

        应该这样做。首先,我们创建一个包含所有A 值的字符向量(x),其中B == vascdf2 中。然后我们从df1 中选择列,其中A == x

        # Create a character vector of all A values when B == vasc
        x <- as.character(df2[df2$B == "vasc", 1])
        
        # Select columns where row A == x
        df1[, which(df1[1, ] %in% x)]
        
          [,1]   [,2]   [,3] 
        A "ABI"  "BET"  "ALN"
        B "0.5"  "3"    "0"  
        C "0"    "0"    "3"  
        D "0.5"  "0.5"  "0.5"
        E "37.5" "37.5" "0.5"
        

        如果我们避免t 调用,我们可以这样做:

        df1[df1$A %in% df2[df2$B == "vasc", 1], ]
        
            A   B C   D    E
        1 ABI 0.5 0 0.5 37.5
        2 BET   3 0 0.5 37.5
        3 ALN   0 3 0.5  0.5
        

        我们可以转置数据框以保留与上述相同的格式:

        t(df1[df1$A %in% df2[df2$B == "vasc", 1], ])
        
          1      2      3    
        A "ABI"  "BET"  "ALN"
        B "0.5"  "3"    "0"  
        C "0"    "0"    "3"  
        D "0.5"  "0.5"  "0.5"
        E "37.5" "37.5" "0.5"
        

        数据:

        df1 <- t(data.frame(
          A = c("ABI", "BET", "ALN", "SPH", "PTI", "DIC", "PTD"), 
          B = c("0.5","3","0","0","5","0","15"), 
          C = c("0","0","3","15","15","0","0"), 
          D = c("0.5","0.5","0.5","0","0","0","0"), 
          E = c("37.5","37.5","0.5","62.5","0.5","0.5","1")
          )
        )
        
        df2 <- data.frame(
          A = c("ABI", "BET", "ALN", "SPH", "PTI", "DIC", "PTD"), 
          B = c("vasc", "vasc","vasc","spha", "moss","moss","moss"), 
          C = c("a", "a", "b", "a", "c","d","a")
        )
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2016-12-15
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2021-03-23
          相关资源
          最近更新 更多