【问题标题】:Is there a way to merge rows? [duplicate]有没有办法合并行? [复制]
【发布时间】:2022-02-04 03:22:22
【问题描述】:

我正在使用 R..

我有一个关于运输公司预订的数据框:

Van Route Departure Price Customer ID
U21 LA - SF 8:00:00 30.00 467866578
U21 LA - SF 8:00:00 30.00 234656433
U21 LA - SF 8:00:00 30.00 654343554
U21 LA - SF 8:00:00 30.00 466534444
U21 LA - SF 8:00:00 30.00 354543433
U22 LA - SD 6:00:00 20.00 345464533
U22 LA - SD 6:00:00 20.00 345456777
U22 LA - SD 6:00:00 20.00 344565411
U22 LA - SD 6:00:00 20.00 119873566

我想制作一个新的数据框来显示:

Van Route Departure Price Tickets Sold Revenue
U21 LA - SF 8:00:00 30.00 5 150.00
U22 LA - SD 6:00:00 20.00 4 80.00

先谢谢了!!!!!!请帮忙:)

【问题讨论】:

    标签: r


    【解决方案1】:

    或使用dplyr

    library(dplyr)
    
    df_new<-df %>%
      group_by(Van, Route, Departure, Price)  %>%
      summarize(`Tickets Sold`=n(),
                Revenue=sum(Price)) %>%
      ungroup()
    
    df_new
    #> # A tibble: 2 × 6
    #>   Van   Route   Departure Price `Tickets Sold` Revenue
    #>   <chr> <chr>   <chr>     <int>          <int>   <int>
    #> 1 U21   LA - SF 8:00:00      30              5     150
    #> 2 U22   LA - SD 6:00:00      20              4      80
    

    reprex package (v2.0.1) 于 2022-02-02 创建

    【讨论】:

      【解决方案2】:
      # load package
      library(data.table)
      
      # set dataframe as datatable
      setDT(df)
      
      # calculate
      df[, .(tickets_sold = .N
             , revenue = sum(Price)
             )
         , .(Van, Route, Departure, Price)
         ]
      

      【讨论】:

      • @MateoGuajardo 抱歉,我按列遗漏了该组。固定。
      【解决方案3】:

      您可以dplyr::group_by() 公共变量,然后使用dplyr::summarize() 计算每个组的条目数n() 和总数Revenuesum()

      library(tidyverse)
      
      d <- structure(list(Van = c("U21", "U21", "U21", "U21", "U21", "U22", "U22", "U22", "U22"), Route = c("LA - SF", "LA - SF", "LA - SF", "LA - SF", "LA - SF", "LA - SD", "LA - SD", "LA - SD", "LA - SD"), Departure = c("8:00:00", "8:00:00", "8:00:00", "8:00:00", "8:00:00", "6:00:00", "6:00:00", "6:00:00", "6:00:00"), Price = c(30, 30, 30, 30, 30, 20, 20, 20, 20), Customer.ID = c(467866578L, 234656433L, 654343554L, 466534444L, 354543433L, 345464533L, 345456777L, 344565411L, 119873566L)), class = "data.frame", row.names = c(NA, -9L))
      
      d %>% 
        group_by(across(Van:Departure)) %>% 
        summarize(Tickets_Sold = n(), Revenue = sum(Price), .groups = "drop")
      #> # A tibble: 2 x 5
      #>   Van   Route   Departure Tickets_Sold Revenue
      #>   <chr> <chr>   <chr>            <int>   <dbl>
      #> 1 U21   LA - SF 8:00:00              5     150
      #> 2 U22   LA - SD 6:00:00              4      80
      

      reprex package (v2.0.1) 于 2022-02-02 创建

      【讨论】:

        猜你喜欢
        • 2017-12-01
        • 2019-12-25
        • 1970-01-01
        • 2017-08-12
        • 2023-04-01
        • 2019-01-10
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多