计算存储在列表中的多个数据框列的平均值答案

【问题标题】：Calculate mean of multiple dataframes columns stored in List计算存储在列表中的多个数据框列的平均值
【发布时间】：2021-02-18 00:50:22
【问题描述】：

我正在做一些模拟，并且我有几个数据框，它们的列存储在一个列表中。对于每个数据帧，我想创建一个新变量，该变量具有之前 2 个数据帧（和当前数据帧）中每一行的平均值。我在制定循环时遇到问题。这是一个可重现的示例：

#Create dataframe 
month <- 1:12
price <- 21:32
df <- data.frame(month, price)

#Separate each row and create a simulation of a new variable. Store new dataframes in a list
simulations <- 100
ints <- seq_len(12)
set.seed(96)
list <- lapply(setNames(ints, paste0("df", ints)), function(i) {
  cbind(
    df[rep(i, simulations),],
    q = as.numeric(runif(simulations, min = 5, max = 10)))
})

#for each df in list, calculate the mean of the last 3 values of q 
for (i in 3:length(list)) {
  list[[i]][["q_mean"]] <- mean(list[[(i-2):i]][["q"]]) #HERE IS THE PROBLEM
  list[[i]][["ben"]] <- list[[i]][["q_mean"]]*list[[i]][["price"]]
}

我收到“列表错误 [[(i - 2): i]] [["q"]]: subscript out of bounds”。有谁知道可能是什么问题？提前致谢！

【问题讨论】：

看起来@redarah 让你走在了正确的道路上，但还有一条评论——我会避免命名一个新对象list，因为 R 中已经存在具有该名称的东西，它可能会导致其他地方的意外行为。

标签： r function loops

【解决方案1】：

我注意到在这里给你带来麻烦的两件事：

1 - 当您像 list[[1:3]] 这样对列表进行子集时，它会被读取为 list[[c(1, 2, 3)]]，并在列表的第一个元素 (df1) 中找到第二列 (price) 的第三个条目 (21)。这就是为什么执行 list[1:2] 之类的操作会返回一个向量（它会提取整个变量）以及为什么 list[1:4] 会返回一个错误（列表不会深入 4 层）。（来自 cmets 的 @aaron-montgomery 的回答）

2 - 在最后一行中，您引用了一个您从未定义过的列 mean。

如果你想得到一个值是所有先前元素的平均值，你可以嵌套另一个循环：

#for each df in list, calculate the mean of the last 3 values of q 
for (i in 3:length(list)) {

  # add another loop to calculate the mean
  vals <- c()
  for (j in (i - 2):i) {
    vals <- c(vals, list[[j]]$q)
  }
  
  list[[i]][["q_mean"]] <- mean(vals)
  
}

如果您希望每一行都有不同的值（其中 row1 是前 2 行的平均值等），您可以这样做：

for (i in 3:length(list)) {
  
  list[[i]][["q_mean"]] <- (list[[i - 1]]$q + list[[i - 2]]$q) /2
  
}

【讨论】：

一个更正：list[1:3] 做了一些比这更微妙的事情——因为1:3 等价于向量c(1, 2, 3)，这个语法将访问第三个 df1 中 price 变量（即 second 列）的条目，列表中的 first 数据框。这就是为什么像list[1:2] 这样的操作会返回一个向量（它会提取整个变量）以及为什么list[1:4] 会返回一个错误（列表不会深入 4 层）。
抱歉，我打错了！已在答案中修复。
谢谢@AaronMontgomery - 更新了我的答案以反映您的评论！
哦 - 所以你希望第 1 行是前 2 行第 1 行的平均值，等等？用代码更新了我上面的答案^！
如果q 是您唯一的列，则会这样做：sub_list <- list[(i - 1):(i - 6)] rowSums(bind_cols(sub_list)) / length(sub_list) 如果您还有其他列，则需要选择这些列