将一列分组并在其中保留其他常量答案

【问题标题】：grouping one column and leaving other constants in there将一列分组并在其中保留其他常量
【发布时间】：2020-04-21 21:33:28
【问题描述】：

如何更改下面代码中的组函数，使其也包含startdate 的常量值？

#Reproducing an example of what I like to have: 
employee <- c('John Doe','John Doe','Peter Gynn','Peter Gynn','Jolie Hope','Jolie Hope')
startdate <- as.Date(c('2010-11-1','2010-11-1','2008-3-25','2008-3-25','2007-3-14','2007-3-14'))
salary <- c(100,200,100,300,800,12)
employ.data <- data.frame(employee, startdate, salary)

#Grouping by employee en summing salary
grouped.file <- employ.data %>% group_by(employee) %>%
  summarize(salary = sum(salary, na.rm =T))

#But I would like to have a dataframe like this: 
employee <- c('John Doe','Peter Gynn','Jolie Hope')
startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))
salary <- c(300,400,812)
employ.data <- data.frame(employee, startdate, salary)

【问题讨论】：

标签： r dataframe dplyr grouping

【解决方案1】：

这里有两种基本的 R 方法来实现它：

使用aggregate()

employ.data <- aggregate(salary ~ employee + startdate, employ.data,FUN = function(x) sum(x,na.rm = T))

给了

> employ.data
    employee  startdate salary
1 Jolie Hope 2007-03-14    812
2 Peter Gynn 2008-03-25    400
3   John Doe 2010-11-01    300

使用ave() 和unique()

unique(within(employ.data, salary <- ave(salary,employee,startdate,FUN = function(x) sum(x,na.rm = T))))

给了

> employ.data
    employee  startdate salary
1   John Doe 2010-11-01    300
3 Peter Gynn 2008-03-25    400
5 Jolie Hope 2007-03-14    812

【讨论】：

【解决方案2】：

如果startdate 是常量，您可以在group_by 中使用它

library(dplyr)

employ.data %>%  
    group_by(employee, startdate) %>% 
    summarize(salary = sum(salary, na.rm =TRUE))

#  employee   startdate  salary
#  <fct>      <date>      <dbl>
#1 John Doe   2010-11-01    300
#2 Jolie Hope 2007-03-14    812
#3 Peter Gynn 2008-03-25    400

或在summarize 中获取其first 值

employ.data %>%  
 group_by(employee) %>% 
 summarize(startdate = first(startdate), salary = sum(salary, na.rm =TRUE))

或使用mutate 并仅选择每组中的第一（任意）行。

employ.data %>% 
  group_by(employee) %>%
  mutate(salary = sum(salary, na.rm =TRUE)) %>%
  slice(1L)

【讨论】：

谢谢罗纳克。除了薪水之外，我如何修改第一个以求和另一列？
@Afke 如果您有多个列要求和，请使用 summarise_at 类似：employ.data %>% group_by(employee, startdate) %>% summarize_at(vars(salary, another_col), sum, na.rm = TRUE)