【问题标题】:How to add additional columns using tidyr group_by function in R?如何在 R 中使用 tidyr group_by 函数添加其他列?
【发布时间】:2020-06-26 22:58:16
【问题描述】:

这个问题是我在this 回答的帖子的后续问题。

数据

df1 <- structure(list(Date = c("6/24/2020", "6/24/2020", "6/24/2020", 
"6/24/2020", "6/25/2020", "6/25/2020"), Market = c("A", "A", 
"A", "A", "A", "A"), Salesman = c("MF", "RP", "RP", "FR", "MF", 
"MF"), Product = c("Apple", "Apple", "Banana", "Orange", "Apple", 
"Banana"), Quantity = c(20L, 15L, 20L, 20L, 10L, 15L), Price = c(1L, 
1L, 2L, 3L, 1L, 1L), Cost = c(0.5, 0.5, 0.5, 0.5, 0.6, 0.6)), 
class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

解决方案

library(dplyr) # 1.0.0
library(tidyr)
df1 %>%
    group_by(Date, Market) %>% 
    group_by(Revenue = c(Quantity %*% Price), 
             TotalCost = c(Quantity %*% Cost),
             Product, .add = TRUE) %>% 
    summarise(Sold = sum(Quantity)) %>% 
    pivot_wider(names_from = Product, values_from = Sold)
# A tibble: 2 x 7
# Groups:   Date, Market, Revenue, TotalCost [2]
#  Date      Market Revenue TotalCost Apple Banana Orange
#  <chr>     <chr>    <dbl>     <dbl> <int>  <int>  <int>
#1 6/24/2020 A          135      37.5    35     20     20
#2 6/25/2020 A           25      15      10     15     NA

@akrun 的解决方案效果很好。现在我想知道如何在现有结果中再添加三列销售人员销售的数量,以便最终输出如下所示:

Date        Market  Revenue Total Cost  Apples Sold Bananas Sold    Oranges Sold    MF  RP  FR
6/24/2020   A       135     37.5        35          20              20              20  35  20
6/25/2020   A       25      15          15          25              NA              25  NA  NA

【问题讨论】:

    标签: r tidyr


    【解决方案1】:

    一种选择是单独进行分组操作,因为这些操作是在单独的列上完成的,然后通过公共列进行连接,即“日期”、“市场”

    library(dplyr)
    library(tidyr)
    out1 <- df1 %>%
               group_by(Date, Market) %>% 
               group_by(Revenue = c(Quantity %*% Price), 
                        TotalCost = c(Quantity %*% Cost),
                         Product, .add = TRUE) %>% 
              summarise(Sold = sum(Quantity)) %>% 
              pivot_wider(names_from = Product, values_from = Sold)
    out2 <- df1 %>% 
              group_by(Date, Market, Salesman) %>% 
              summarise(SalesSold = sum(Quantity)) %>% 
              pivot_wider(names_from = Salesman, values_from = SalesSold)
    
    left_join(out1, out2)
    # A tibble: 2 x 10
    # Groups:   Date, Market, Revenue, TotalCost [2]
    #  Date      Market Revenue TotalCost Apple Banana Orange    FR    MF    RP
    #  <chr>     <chr>    <dbl>     <dbl> <int>  <int>  <int> <int> <int> <int>
    #1 6/24/2020 A          135      37.5    35     20     20    20    20    35
    #2 6/25/2020 A           25      15      10     15     NA    NA    25    NA
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-10-06
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-01-03
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多