【问题标题】:Summation of money amounts in character format by group按组以字符格式汇总金额
【发布时间】:2021-04-08 14:55:34
【问题描述】:

我有一个数据框,其中包含个人之间的货币交易。交易可以是双向的,即A可以向B转账,B也可以向A转账。数据框的结构如下所示:

From  To  Amount
A     B   $100
A     C   $40
A     D   $30
B     A   $25
B     C   $70
C     A   $190
C     D   $110

我想总结每对相互交易的个体之间的交易总量,结果应该是这样的:

Individual_1  Individual_2    Sum
A             B               $125
A             C               $230
A             D               $30
B             C               $70
C             D               $110

我尝试使用包dplyr 的分组功能,但我认为它不适用于我的情况。

【问题讨论】:

    标签: r dataframe aggregate


    【解决方案1】:

    一个没有包的完整解决方案,基于 @RonakShah 的伟大的 pmin/pmax 方法,使用 aggregate 中的列表表示法(与公式表示法相反),它允许名称分配。

    with(
      transform(d, a=as.numeric(gsub("\\D", "", Amount)), b=pmin(From, To), c=pmax(From, To)),
      aggregate(list(Sum=a), list(Individual_1=b, Individual_2=c), function(x) 
        paste0("$", sum(x))))
    #   Individual_1 Individual_2  Sum
    # 1            A            B $125
    # 2            A            C $230
    # 3            B            C  $70
    # 4            A            D  $30
    # 5            C            D $110
    

    数据:

    d <- structure(list(From = c("A", "A", "A", "B", "B", "C", "C"), To = c("B", 
    "C", "D", "A", "C", "A", "D"), Amount = c("$100", "$40", "$30", 
    "$25", "$70", "$190", "$110")), class = "data.frame", row.names = c(NA, 
    -7L))
    

    【讨论】:

    • 这与我的回答有何不同?
    • @RonakShah 嗨,新年快乐,Sum 值以 $ 为前缀,列名类似于 OP 的预期输出。我在编辑中提到了你的想法。
    【解决方案2】:

    您可以使用pmin/pmaxFromTo 列和sum 排序Amount 值。

    library(dplyr)
    
    df %>%
      group_by(col1 = pmin(From, To), 
               col2 = pmax(From, To)) %>%
      summarise(Amount = sum(readr::parse_number(Amount)))
    
    #  col1  col2  Amount
    #  <chr> <chr>  <dbl>
    #1 A     B        125
    #2 A     C        230
    #3 A     D         30
    #4 B     C         70
    #5 C     D        110
    

    在基础 R 中使用相同的逻辑,您可以这样做:

    aggregate(Amount~col1 + col2, 
          transform(df, col1 = pmin(From, To), col2 = pmax(From, To), 
                    Amount = as.numeric(sub('$', '', Amount, fixed = TRUE))), sum)
    

    数据

    df <- structure(list(From = c("A", "A", "A", "B", "B", "C", "C"), To = c("B", 
    "C", "D", "A", "C", "A", "D"), Amount = c("$100", "$40", "$30", 
    "$25", "$70", "$190", "$110")), class = "data.frame", row.names = c(NA, -7L))
    

    【讨论】:

    • 太棒了!非常感谢!
    【解决方案3】:

    使用tidyverse 包的解决方案。您需要找到一种方法来创建具有正确的个人顺序的公共分组列。 dat2 是最终输出。

    library(tidyverse)
    
    dat2 <- dat %>%
      mutate(Amount = as.numeric(str_remove(Amount, "\\$"))) %>%
      mutate(Group = map2_chr(From, To, ~str_c(sort(c(.x, .y)), collapse = "_"))) %>%
      group_by(Group) %>%
      summarize(Sum = sum(Amount, na.rm = TRUE)) %>%
      separate(Group, into = c("Individual_1", "Individual_2"), sep = "_") %>%
      mutate(Sum = str_c("$", Sum))
    print(dat2)
    # # A tibble: 5 x 3
    #   Individual_1 Individual_2 Sum  
    #   <chr>        <chr>        <chr>
    # 1 A            B            $125 
    # 2 A            C            $230 
    # 3 A            D            $30  
    # 4 B            C            $70  
    # 5 C            D            $110 
    

    数据

    dat <- read.table(text = "From  To  Amount
    A     B   $100
    A     C   $40
    A     D   $30
    B     A   $25
    B     C   $70
    C     A   $190
    C     D   $110",
                    header = TRUE)
    

    【讨论】:

    • 太棒了!非常感谢!
    猜你喜欢
    • 2017-07-11
    • 2019-07-22
    • 2020-01-08
    • 1970-01-01
    • 1970-01-01
    • 2020-10-05
    • 2020-12-23
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多