从一大组数据帧中产生平均值、标准偏差和平均值的标准误差答案

【问题标题】：Producing Average, Standard Deviation and Standard Error of Mean from a big group of dataframe从一大组数据帧中产生平均值、标准偏差和平均值的标准误差
【发布时间】：2021-04-16 14:33:54
【问题描述】：

假设我有一个名为“Data”的数据框，如下所示：

View(Data)
Ball Day Expansion
Red  1   5
Red  1   8
Red  1   3
Red  2   7
Red  2   9
Blue 1   5
Blue 1   3
Blue 2   7
Blue 2   5
Blue 2   4
...

我想从这组数据中得到均值（SE）、标准差（SD）和均值的标准误，这样最终的产品就是这样的

#Note: 'Expansion' value shown is showing the mean of the group, 'x' and 'y' are the result of the SE and SD

Ball Day Expansion SE SD
Red  1    7        X  Y
Red  2    5        X  Y
Red  3    6        X  Y
Red  4    5        X  Y
Blue 1    4        X  Y
Blue 2    8        X  Y
Blue 3    6        X  Y
...

有人知道怎么做吗？

【问题讨论】：

标签： r

【解决方案1】：

我希望这是你的想法：

library(dplyr)

df %>%
  group_by(Ball, Day) %>%
  summarise(across(Expansion, list(Mean = mean, 
                                SD = sd, 
                                SE = function(x) sqrt(var(x)/length(x))), 
                   .names = "{.fn}.{.col}"))

# A tibble: 4 x 5
# Groups:   Ball [2]
  Ball    Day Mean.Expansion SD.Expansion SE.Expansion
  <chr> <dbl>          <dbl>        <dbl>        <dbl>
1 Blue      1           4            1.41        1    
2 Blue      2           5.33         1.53        0.882
3 Red       1           5.33         2.52        1.45 
4 Red       2           8            1.41        1

summarise 函数的输出更整洁，正如亲爱的@www 所建议的那样，但是，mutate 的输出更接近您在问题中的内容：

# A tibble: 10 x 6
# Groups:   Ball, Day [4]
   Ball    Day Expansion Mean.Expansion SD.Expansion SE.Expansion
   <chr> <dbl>     <dbl>          <dbl>        <dbl>        <dbl>
 1 Red       1         5           5.33         2.52        1.45 
 2 Red       1         8           5.33         2.52        1.45 
 3 Red       1         3           5.33         2.52        1.45 
 4 Red       2         7           8            1.41        1    
 5 Red       2         9           8            1.41        1    
 6 Blue      1         5           4            1.41        1    
 7 Blue      1         3           4            1.41        1    
 8 Blue      2         7           5.33         1.53        0.882
 9 Blue      2         5           5.33         1.53        0.882
10 Blue      2         4           5.33         1.53        0.882

数据：

df <- tribble(
  ~Ball, ~Day, ~Expansion,
  "Red",  1,   5,
  "Red",  1,   8,
  "Red",  1,   3,
  "Red",  2,   7,
  "Red",  2,   9,
  "Blue", 1,   5,
  "Blue", 1,   3,
  "Blue", 2,   7,
  "Blue", 2,   5,
  "Blue", 2,   4
)

【讨论】：

从OP提供的输出来看，因为Ball和Day都是唯一的，我猜OP想要group_byBall和Day，并使用@ 987654331@ 函数，而不是mutate。但我可能是错的，因为 OP 没有提供清晰的描述或可重复的示例。
是的，我认为你是对的！我首先按Ball 分组，然后看到您的代码并修改了我的代码，对此我表示感谢。但我相信这留给 OP 以她/他想要的任何方式修改代码。但我确实认为尽管summarise 产生了非常简洁的输出，但所需的输出更接近mutate。

【解决方案2】：

这是一种方法。我们可以使用dplyr 包进行这样的计算

library(dplyr)

Data2 <- Data %>%
  group_by(Ball, Day) %>%
  summarize(Mean = mean(Expansion),
            SE = sd(Expansion)/sqrt(n()),
            SD = sd(Expansion)) %>%
  rename(Expansion = Mean) %>%
  ungroup() 

Data2
# # A tibble: 4 x 5
#   Ball    Day Expansion    SE    SD
#   <chr> <int>     <dbl> <dbl> <dbl>
# 1 Blue      1      4    1      1.41
# 2 Blue      2      5.33 0.882  1.53
# 3 Red       1      5.33 1.45   2.52
# 4 Red       2      8    1      1.41

数据

Data <- read.table(
  text = "Ball Day Expansion
Red  1   5
Red  1   8
Red  1   3
Red  2   7
Red  2   9
Blue 1   5
Blue 1   3
Blue 2   7
Blue 2   5
Blue 2   4", header = TRUE
)

【讨论】：