【问题标题】:grouping one column and leaving other constants in there将一列分组并在其中保留其他常量
【发布时间】:2020-04-21 21:33:28
【问题描述】:

如何更改下面代码中的组函数,使其也包含startdate 的常量值?

#Reproducing an example of what I like to have: 
employee <- c('John Doe','John Doe','Peter Gynn','Peter Gynn','Jolie Hope','Jolie Hope')
startdate <- as.Date(c('2010-11-1','2010-11-1','2008-3-25','2008-3-25','2007-3-14','2007-3-14'))
salary <- c(100,200,100,300,800,12)
employ.data <- data.frame(employee, startdate, salary)

#Grouping by employee en summing salary
grouped.file <- employ.data %>% group_by(employee) %>%
  summarize(salary = sum(salary, na.rm =T))

#But I would like to have a dataframe like this: 
employee <- c('John Doe','Peter Gynn','Jolie Hope')
startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))
salary <- c(300,400,812)
employ.data <- data.frame(employee, startdate, salary)

【问题讨论】:

    标签: r dataframe dplyr grouping


    【解决方案1】:

    这里有两种基本的 R 方法来实现它:

    • 使用aggregate()
    employ.data <- aggregate(salary ~ employee + startdate, employ.data,FUN = function(x) sum(x,na.rm = T))
    

    给了

    > employ.data
        employee  startdate salary
    1 Jolie Hope 2007-03-14    812
    2 Peter Gynn 2008-03-25    400
    3   John Doe 2010-11-01    300
    
    • 使用ave()unique()
    unique(within(employ.data, salary <- ave(salary,employee,startdate,FUN = function(x) sum(x,na.rm = T))))
    

    给了

    > employ.data
        employee  startdate salary
    1   John Doe 2010-11-01    300
    3 Peter Gynn 2008-03-25    400
    5 Jolie Hope 2007-03-14    812
    

    【讨论】:

      【解决方案2】:

      如果startdate 是常量,您可以在group_by 中使用它

      library(dplyr)
      
      employ.data %>%  
          group_by(employee, startdate) %>% 
          summarize(salary = sum(salary, na.rm =TRUE))
      
      #  employee   startdate  salary
      #  <fct>      <date>      <dbl>
      #1 John Doe   2010-11-01    300
      #2 Jolie Hope 2007-03-14    812
      #3 Peter Gynn 2008-03-25    400
      

      或在summarize 中获取其first

      employ.data %>%  
       group_by(employee) %>% 
       summarize(startdate = first(startdate), salary = sum(salary, na.rm =TRUE))
      

      或使用mutate 并仅选择每组中的第一(任意)行。

      employ.data %>% 
        group_by(employee) %>%
        mutate(salary = sum(salary, na.rm =TRUE)) %>%
        slice(1L)
      

      【讨论】:

      • 谢谢罗纳克。除了薪水之外,我如何修改第一个以求和另一列?
      • @Afke 如果您有多个列要求和,请使用 summarise_at 类似:employ.data %&gt;% group_by(employee, startdate) %&gt;% summarize_at(vars(salary, another_col), sum, na.rm = TRUE)
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-10-13
      • 2022-07-25
      • 2016-12-29
      • 2021-05-15
      • 2017-04-06
      相关资源
      最近更新 更多