【问题标题】:Summarize columns in R Data Frame independent of the order, (df$A,df$B) = (df$B,df$A) [duplicate]汇总 R 数据框中的列,与顺序无关, (df$A,df$B) = (df$B,df$A) [重复]
【发布时间】:2018-07-10 21:36:08
【问题描述】:

我有以下数据框:

命运起源计数 1 KJFK SBBR 4 2 KJFK SAEZ 4683 3 SBGL KJFK 2 4 SBBR KJFK 2 5 KJFK SBGL 4987 6 KJFK SBGR 12911 ...

因为我对这条路线很感兴趣,所以对我来说 KJFK -> SBBR 与 SBBR -> KJFK 相同。所以我想总结一下他们的数量,如下表

命运起源计数 1 KJFK SBBR 6 2 KJFK SAEZ 4683 3 SBGL KJFK 4989 4 KJFK SBGR 12911 ...

我不想使用大的 for 循环来评估所有值

【问题讨论】:

    标签: r dataframe merge summarization


    【解决方案1】:

    这个怎么样?

    library(tidyverse)
    df %>%
        mutate_if(is.factor, as.character) %>%
        rowwise() %>%
        mutate(grp = paste0(sort(c(destiny, origin)), collapse = "_")) %>%
        ungroup() %>%
        group_by(grp) %>%
        summarise(Count = sum(Count)) %>%
        separate(grp, into = c("destiny", "origin"))
    #        # A tibble: 4 x 3
    #  destiny origin Count
    #  <chr>   <chr>  <int>
    #1 KJFK    SAEZ    4683
    #2 KJFK    SBBR       6
    #3 KJFK    SBGL    4989
    #4 KJFK    SBGR   12911
    

    请注意,由于您不关心destinyorigin 的顺序,所以这里我们按字母顺序排列它们。因此,在您上面给出的示例中,KJFK -&gt; SBBRSBBR -&gt; KJFK 将变为 destiny = KJFK, origin = SBBR


    样本数据

    df <- read.table(text =
        "  destiny origin Count
        1 KJFK    SBBR       4
        2 KJFK    SAEZ    4683
        3 SBGL    KJFK       2
        4 SBBR    KJFK       2
        5 KJFK    SBGL    4987
        6 KJFK    SBGR   12911", header =T)
    

    【讨论】:

    • 它工作正常 :) 但 mutate_if(is.factor, as.character) 只格式化第二列,所以我在此之前将第一列转换为字符
    • @gustavoPacheco mutate_if 适用于所有 factor 列。也许您的第一列不是factor
    【解决方案2】:

    这是pmin/pmax的选项

    library(tidyverse)
    df1 %>%       
      group_by(destinyN = pmin(destiny, origin), originN = pmax(destiny, origin)) %>% 
      summarise(destiny = first(destiny), 
                origin = first(origin), 
                Count = sum(Count)) %>%
      ungroup %>%
      select(-destinyN, -originN)
    # A tibble: 4 x 3
    #  destiny origin Count
    #  <chr>   <chr>  <int>
    #1 KJFK    SAEZ    4683
    #2 KJFK    SBBR       6
    #3 SBGL    KJFK    4989
    #4 KJFK    SBGR   12911
    

    数据

    df1 <- structure(list(destiny = c("KJFK", "KJFK", "SBGL", "SBBR", "KJFK", 
    "KJFK"), origin = c("SBBR", "SAEZ", "KJFK", "KJFK", "SBGL", "SBGR"
    ), Count = c(4L, 4683L, 2L, 2L, 4987L, 12911L)), .Names = c("destiny", 
    "origin", "Count"), row.names = c("1", "2", "3", "4", "5", "6"
    ), class = "data.frame")
    

    【讨论】:

      猜你喜欢
      • 2021-10-31
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多