【问题标题】:Grouping binary dataframe in R by category按类别对 R 中的二进制数据帧进行分组
【发布时间】:2018-03-01 16:48:12
【问题描述】:

我的数据框 df 目前看起来像这样:

  cat 1 2 3 4
1 a   0 1 0 1
2 b   0 0 1 0 
3 b   1 0 1 1 
4 a   1 0 1 1
5 b   1 1 1 1
6 a   0 1 1 0

cat <- c("a", "b", "b", "a", "b", "a")
df = cbind(cat, data.frame(matrix(c(0, 1, 0, 1, 0, 
0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 
1, 0), nrow=6, byrow = T)))

(即第一列中的 2 个类别,每个后续列中的每个类别的二进制数据)。理想情况下,我想按类别对每一列进行分组,但也可以按二元类别分组,最终得到如下结果:

1 a.0 2 1 1 1
2 a.1 1 2 2 2
3 b.0 0 1 0 1
4 b.1 2 1 2 2

到目前为止我最好的尝试是:

aggregate(df[,-1], by=list(df[,1]), FUN = table)

但不幸的是,这并不能准确地告诉我我想要什么

【问题讨论】:

    标签: r dataframe aggregate


    【解决方案1】:
    library(dplyr)
    library(tidyr)
    
    df %>%
      gather(key, value, -cat) %>%
      mutate(new_cat=paste(cat, value, sep="_")) %>%
      group_by(new_cat, key) %>%
      tally() %>%
      spread(key, n) %>%
      replace(., is.na(.), 0)
    

    输出是:

      new_cat    X1    X2    X3    X4
    1     a_0     2     1     1     1
    2     a_1     1     2     2     2
    3     b_0     1     2     0     1
    4     b_1     2     1     3     2
    

    样本数据:

    df <- structure(list(cat = c("a", "b", "b", "a", "b", "a"), X1 = c(0L, 
    0L, 1L, 1L, 1L, 0L), X2 = c(1L, 0L, 0L, 0L, 1L, 1L), X3 = c(0L, 
    1L, 1L, 1L, 1L, 1L), X4 = c(1L, 0L, 1L, 1L, 1L, 0L)), .Names = c("cat", 
    "X1", "X2", "X3", "X4"), class = "data.frame", row.names = c("1", 
    "2", "3", "4", "5", "6"))
    

    【讨论】:

      【解决方案2】:

      您可以通过以下方式统计数据框中的每个二进制类别:

      df[df$cat == "a", -1]  == 1
      

      此示例来自 a 和 1。该命令将返回下表:

           X1    X2    X3    X4
      1 FALSE  TRUE FALSE  TRUE
      4  TRUE FALSE  TRUE  TRUE
      6 FALSE  TRUE  TRUE FALSE
      

      现在,您可以按列向该函数应用总和以获取其中一行。在这种情况下,它会返回数据框的 a.1 行:

      apply(df[df$cat == "a", -1]  == 1, 2, sum)
      

      同样,你可以找到其他剩余的行。

      apply(df[df$cat == "a", -1]  == 0, 2, sum)
      apply(df[df$cat == "a", -1]  == 1, 2, sum)
      apply(df[df$cat == "b", -1]  == 0, 2, sum)
      apply(df[df$cat == "b", -1]  == 1, 2, sum)
      

      如果您确实需要重复此操作,可以构建一个迭代函数,在每次迭代中您根据 cat 的值更改感兴趣的值,即

      for (val in levels(df$cat)) apply(df[df$cat == val, -1]  == 1, 2, sum)
      

      【讨论】:

        【解决方案3】:
        df <- structure(list(cat = c("a", "b", "b", "a", "b", "a"), X1 = c(0L, 
        0L, 1L, 1L, 1L, 0L), X2 = c(1L, 0L, 0L, 0L, 1L, 1L), X3 = c(0L, 
        1L, 1L, 1L, 1L, 1L), X4 = c(1L, 0L, 1L, 1L, 1L, 0L)), .Names = c("cat", 
        "X1", "X2", "X3", "X4"), class = "data.frame", row.names = c("1", 
        "2", "3", "4", "5", "6"))
        
        df <- split(df, df$cat) # Split by Cat
        df <- lapply(seq_along(df),function(i) 
              {
                kk<- apply(df[[i]],2,table) # Find frequency in each column 
                kk <- data.frame(do.call(cbind, kk)) # Combine list by column 
                kk$cat <- paste(names(df)[i],rownames(kk), sep = ".") # Define name of cat column 
                rownames(kk)<- NULL
                kk
              })
        n_df <- do.call(rbind, df) # Combine list by row 
        

        【讨论】:

          猜你喜欢
          • 2017-01-23
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2021-06-30
          • 2017-08-20
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多