dplyr 函数 group_by 几个变量答案

【问题标题】：dplyr function group_by several variablesdplyr 函数 group_by 几个变量
【发布时间】：2019-12-20 19:34:36
【问题描述】：

我读过R的dplyr编程介绍（https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html），非常有用。

我经常构建包含多组分组变量的相当复杂的函数。例如，给定一个数据集 df，我可能希望该函数通过一些变量进行汇总（比如将变量 G1 和 G2 分组），然后通过其他一些变量进行汇总（比如 G3），然后我将使用这些汇总来生成一些最终结果

df <- data.frame(xV = 1:3,yV=0:2, G1 =c(1,1,0),G2=c(0,0,1),G3=c(1,1,1))
#Within my function I want to calculate 
#a)
df%>%group_by(G1,G2)%>%summarise(MEANS1= mean(xV,na.rm=T))
#As well as (b_
df%>%group_by(G3)%>%summarise(MEAN2= mean(xV,na.rm=T))

如果我只需要进行第一个分组（即 (a)），我可以构建一个函数，使用 ...

TAB2<-function(data,x,...){
  require(dplyr)  
  x<-enquo (x)
  groupSet1 <- enquos(...)

  data%>%group_by(!!!(groupSet1))%>%
    summarise(MEAN=mean(!!x,na.rm=T))
}

#Which gives me my results
TAB2(data=df,x=xV,G1,G2)
# A tibble: 2 x 3
# Groups:   G1 [2]
     G1    G2  MEAN
  <dbl> <dbl> <dbl>
1     0     1   3  
2     1     0   1.5

但是如果我想同时做 (a) 和 (b) 我需要以某种方式分别区分第一组和第二组分组变量 (G1, G2) 和 G3。我不能通过在所有其他输入之后丢弃分组变量来做到这一点。有什么方法可以在输入中指定这两组，类似于

TAB3<-function(data,x,y, GroupSet1=c(G1,G2) and GroupSet2=(G3)){

 x<-enquo (x)
 y<-enquo (x)
#a)
df%>%group_by(GroupSet1)%>%summarise(MEANS1= mean(!!x,na.rm=T))
#As well as (b_)
df%>%group_by(GroupSet2)%>%summarise(MEAN2= mean(!!y,na.rm=T))

}

我尝试以与 x

【问题讨论】：

标签： r dplyr

【解决方案1】：

你可以试试

TAB3<-function(data, y, grouping_list){
  require(tidyverse)
  map(grouping_list, ~group_by_at(data, .) %>% 
        summarise_at(y, list(Mean= mean), na.rm=T)) }

TAB3(df, "xV", list(c("G1", "G2"), c("G3"))) 
[[1]]
# A tibble: 2 x 3
# Groups:   G1 [2]
     G1    G2  Mean
  <dbl> <dbl> <dbl>
1     0     1   3  
2     1     0   1.5

[[2]]
# A tibble: 1 x 2
     G3  Mean
  <dbl> <dbl>
1     1     2

【讨论】：

这太好了，我不知道你可以这样使用地图。但它确实依赖于我的数据集中有 Xv（虽然它可能被称为其他东西）。抱歉，这是我的错。如果可以让 summarise 函数尽可能通用（例如，使用 summarise_at 以便可以传递许多变量），那就太棒了。

【解决方案2】：

如果您想按照 TAB2 示例使用省略号，您可以尝试：

根据新信息更新：

TAB3<-function(df,x,...){
  args <- substitute(list(...))
  names_env <- setNames(as.list(names(df)), names(df))
  arg_list <- eval(args, names_env)

  out <- vector(mode = "list", length(arg_list)) 

  for(i in seq_along(arg_list)){
    out[[i]] <- df %>% group_by(!!!syms(arg_list[[i]])) %>%
      summarise_at(vars(!!!enquos(x)) ,.funs  = list(mean=mean), na.rm = T)
  }
  out
}

TAB3(df, x = c(xV,yV), GroupSet1=c(G1,G2), GroupSet2=G3)

#[[1]]
# A tibble: 2 x 4
# Groups:   G1 [2]
#     G1    G2 xV_mean yV_mean
#  <dbl> <dbl>   <dbl>   <dbl>
#1     0     1     3       2  
#2     1     0     1.5     0.5

#[[2]]
# A tibble: 1 x 3
#     G3 xV_mean yV_mean
#  <dbl>   <dbl>   <dbl>
#1     1       2       1

【讨论】：