dplyr：组均值居中（变异+汇总）答案

【问题标题】：dplyr: group mean centering (mutate + summarize)dplyr：组均值居中（变异+汇总）
【发布时间】：2015-04-11 20:48:24
【问题描述】：

使用 dplyr 以组为中心的有效/首选方法是什么，即获取组中的每个元素 (mutate) 并对其执行操作以及该组的摘要统计 (summarize)。以下是使用 base R 以mtcars 为中心的群组意思的方法：

do.call(rbind, lapply(split(mtcars, mtcars$cyl), function(x){ 
    x[["cent"]] <- x$mpg - mean(x$mpg)
    x
}))

【问题讨论】：

这个作品甚至没有尝试过，因为我不知道你可以在mutate 上使用group_by。非常感谢。

标签： r dplyr

【解决方案1】：

你可以试试

library(dplyr)
mtcars %>%
      add_rownames()%>% #if the rownames are needed as a column
      group_by(cyl) %>% 
      mutate(cent= mpg-mean(mpg))

【讨论】：

【解决方案2】：

上面的代码似乎使用全局均值来居中mpg；如果我想以组内平均值为中心，我应该怎么做，即每个 cyl 组级别的平均值不同。

> mtcars %>%
+   add_rownames()%>% #if the rownames are needed as a column
+   group_by(cyl) %>% 
+   mutate(cent= mpg-mean(mpg))%>%
+   dplyr ::select(cent)
Adding missing grouping variables: `cyl`
# A tibble: 32 x 2
# Groups:   cyl [3]
     cyl   cent
   <dbl>  <dbl>
 1     6  0.909
 2     6  0.909
 3     4  2.71 
 4     6  1.31 
 5     8 -1.39 
 6     6 -1.99 
 7     8 -5.79 
 8     4  4.31 
 9     4  2.71 
10     6 -0.891
# … with 22 more rows
Warning message:
Deprecated, use tibble::rownames_to_column() instead. 
> mtcars$mpg[1:5]-mean(mtcars$mpg)
[1]  0.909375  0.909375  2.709375  1.309375 -1.390625

【讨论】：

【解决方案3】：

你可以试试这个（虽然显示的新变量的名字不同）：

mtcars %>%
  group_by(cyl) %>%
  mutate(gpcent = scale(mpg, scale = F))

【讨论】：