使用 R 从列中计算重复均值答案

【问题标题】：Calculating repeated means from columns using R使用 R 从列中计算重复均值
【发布时间】：2020-07-08 09:40:21
【问题描述】：

希望这是一个关于 R 中循环的简单问题。我有一个由模拟结果组成的数据集。每列是来自一头奶牛的结果，每个月每天取一次，然后重复 100 次。所以列的总长度是3000。我想计算每一天的模拟结果的平均值，以获得每一天、每一头奶牛的单个值。所以我需要计算第一个条目的平均值，第31个条目，第61个条目等等，然后计算第二个条目的平均值，第32个条目，第62个条目等等。我想最终为每头奶牛提供 30 个条目列。我一直在尝试使用 R 中的循环来做到这一点，但不知道怎么做。任何建议将不胜感激。

以下是一些示例数据：

a<-seq(from = 1, by = 1, length = 30)
b<-seq(from = 1, by = 0.5, length = 30)
c<-seq(from = 1, by = 2, length = 30)

cow1<-rep(a,100)
cow2<-rep(b,100)
cow3<-rep(c,100)

dat<-as.data.frame(cbind(cow1,cow2,cow3))

【问题讨论】：

如果您为每头奶牛创建一个 30 x 100 矩阵 cow1 和 apply(cow1,1,mean)，您将获得日常收入。
@Xi'an，而不是apply 为什么不只是rowMeans(cow1)
试试：aggregate(dat, list(rep_len(1:30, nrow(dat))), mean)
@GKi，我总是忘记aggregate 的非公式版本，我打算建议类似dat$Day <- rep_len(1:30, nrow(dat)); aggregate(.~Day, dat, mean) 的东西
@DanielO 或：aggregate(.~Day, cbind(dat,Day=rep_len(1:30, nrow(dat))), mean)

标签： r

【解决方案1】：

我认为最好构造一个“day”列，然后与tapply一起使用，正如西安所说，不需要循环，循环会更慢，更不干净。在代码中，这给了我们：

a <- seq(from = 1, by = 1, length = 30)
b <- seq(from = 1, by = 0.5, length = 30)
c <- seq(from = 1, by = 2, length = 30)

day <- seq(from = 1, by = 1, length = 30)
day <- rep(day,100)

cow1 <- rep(a,100)
cow2 <- rep(b,100)
cow3 <- rep(c,100)

# Construct a data frame, I find this cay is better as it gives names to the columns.
dat <- data.frame(day,cow1,cow2,cow3)

# Here are the results
tapply(dat$cow1, dat$day, mean)
tapply(dat$cow2, dat$day, mean)
tapply(dat$cow3, dat$day, mean)

【讨论】：

+1。或者使用by(dat[,-1],dat$day,colMeans) 一次完成所有奶牛。或者使用do.call(rbind,by(dat[,-1],dat$day,colMeans)) 将结果收集到一个矩阵中。

【解决方案2】：

我同意TMat，包括一个带有day的列很有用。

这是我使用tidyverse的工作示例

library(tidyverse)

a <- seq(from = 1, by = 1, length = 30)
b <- seq(from = 1, by = 0.5, length = 30)
c <- seq(from = 1, by = 2, length = 30)

day <- seq(from = 1, by = 1, length = 30)
day <- rep(day,100)

cow1 <- rep(a,100)
cow2 <- rep(b,100)
cow3 <- rep(c,100)

dat <- data.frame(day,cow1,cow2,cow3) %>% 
  pivot_longer(cols = 2:4) %>% 
  group_by(day, name) %>% 
  summarize(mean = mean(value))
#> `summarise()` regrouping output by 'day' (override with `.groups` argument)
dat
#> # A tibble: 90 x 3
#> # Groups:   day [30]
#>      day name   mean
#>    <dbl> <chr> <dbl>
#>  1     1 cow1    1  
#>  2     1 cow2    1  
#>  3     1 cow3    1  
#>  4     2 cow1    2  
#>  5     2 cow2    1.5
#>  6     2 cow3    3  
#>  7     3 cow1    3  
#>  8     3 cow2    2  
#>  9     3 cow3    5  
#> 10     4 cow1    4  
#> # ... with 80 more rows

ggplot(dat, aes(x = day, y = mean, fill = name)) + 
  geom_col(position = "dodge")

^{由reprex package (v0.3.0) 于 2020-07-08 创建}

【讨论】：