【问题标题】:Remove reversal transaction删除冲销交易
【发布时间】:2018-11-29 15:41:48
【问题描述】:

我有一些冲销交易的交易级别数据。这些交易以负数表示,然后以正数表示。

trnx_df <- data.frame(Date = c("2018-01-01", "2018-01-01", "2018-01-01", "2018-01-01", "2018-01-03", "2018-01-03", "2018-01-05", "2018-02-01",
                            "2018-02-01", "2018-02-01"),
                   Product = c("A", "A", "A", "A", "B", "B", "B", "A", "A", "A"),
                   Amount = c(-1000, 1000, 1000, 1000, -1000, 1000, 500, -2000, 1000, 2000))

trnx_df

             Date Product Amount
    1  2018-01-01       A  -1000
    2  2018-01-01       A   1000
    3  2018-01-01       A   1000
    4  2018-01-01       A   1000
    5  2018-01-03       B  -1000
    6  2018-01-03       B   1000
    7  2018-01-05       B    500
    8  2018-02-01       A  -2000
    9  2018-02-01       A   1000
    10 2018-02-01       A   2000

我想得出该客户在特定产品上花费的总金额和最高金额。

通过使用 dplyr 我到达:

library(dplyr)

trnx_summary <- trnx_df %>%
group_by(Product) %>%
summarize(Total_amount = sum(Amount),
        Max_amount = max(Amount))

trnx_summary
  Product Total_amount Max_amount
1       A         3000       2000
2       B          500       1000

总的来说不会有问题,因为负数会抵消正数,但对于花费的最大金额,我会得到错误的输出。

产品 A 的最大数量应为 1000(2000-2000 将相互抵消)。

我该如何解决这个问题?另外,有没有办法从 dataframe 本身中删除这些冲销交易?

【问题讨论】:

  • "reversal transactions"这是否意味着如果有1000-1000,那么忽略那些行?
  • 是的。我们应该忽略那些行
  • 如果交易被取消,你怎么知道哪个?
  • 负数表示取消的金额,但作为交易条目,正负都被捕获

标签: r dataframe dplyr data-cleaning


【解决方案1】:
df %>% #filter the negative transactions, save in dftemp
  filter(Amount < 0) %>% 
  mutate(Amount = abs(Amount)) -> dftemp # in dftemp, negative transactions are positive to ease looking for matches

df %>%  #filter the positive transactions that do no have a negative duplicate
  filter(Amount > 0) %>% 
  anti_join(dftemp) -> dfuniques  

df %>% 
  filter(Amount > 0) %>% #filter positive transactions
  inner_join(dftemp) %>% #merge obs that are both in the original df and in dftemp 
  group_by(Date, Product, Amount) %>%  #group by date, product and amount
  slice(-1) %>% #for each date, product & amount combo, delete 1 row (which is a duplicate of one negative and one positive transaction)
  full_join(dfuniques) %>% # join the unique positive transactions (from here on, you have your desired dataframe with negative and positive transactions that cancelled each other out deleted)
  group_by(Product) %>% 
  summarise(Total_Amount = sum(Amount), Max_Amount = max(Amount))

  Product Total_Amount Max_Amount
   <fctr>        <dbl>      <dbl>
1       A         3000       1000
2       B          500        500

【讨论】:

    【解决方案2】:

    使用 leadlag 函数:

    trnx_df %>% 
      group_by(Product, AmountAbs = abs(Amount)) %>% 
      arrange(Product, AmountAbs, Amount) %>% 
      mutate(
        remove =
          (sign(lag(Amount, default = 0)) == -1 &
               lag(AmountAbs, default = 0) == Amount) |
          ((sign(Amount)) == -1 &
             lead(AmountAbs) == AmountAbs)) %>% 
      ungroup() %>% 
      filter(!remove) %>%
      group_by(Product) %>% 
      summarise(Total_Amount = sum(Amount), Max_Amount = max(Amount))
    
    # # A tibble: 2 x 3
    # Product Total_Amount Max_Amount
    #   <fct>          <dbl>      <dbl>
    # 1 A               3000       1000
    # 2 B                500        500
    

    【讨论】:

      猜你喜欢
      • 2021-05-19
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-11-30
      • 2015-11-02
      • 2020-10-24
      • 2016-07-02
      • 2014-11-19
      相关资源
      最近更新 更多