合并和聚合多个data.frames答案

【问题标题】：Combine and aggregate multiple data.frames合并和聚合多个data.frames
【发布时间】：2014-03-03 21:09:22
【问题描述】：

我有一组 .csv 文件，每个文件都包含相同数量的行和列。每个文件都包含一些以 A、B、C 为特征的测试对象的观察结果（“值”列），格式类似于以下：

A B C value
1 1 1 0.5
1 1 2 0.6
1 2 1 0.1
1 2 2 0.2
. . . .

假设每个文件都被读入一个单独的数据框。将这些数据帧组合成单个数据帧的最有效方法是什么，其中“值”列包含手段，或者一般来说，某些函数调用给定测试对象的所有“值”行的结果。 A、B 和 C 列在所有文件中都是不变的，可以被视为这些观察的关键。

感谢您的帮助。

【问题讨论】：

标签： r dataframe

【解决方案1】：

这应该很容易，假设所有文件都以相同的方式排序：

dflist <- lapply(dir(pattern='csv'), read.csv)
# row means:
rowMeans(do.call('cbind', lapply(dflist, `[`, 'value')))
# other function `myfun` applied to each row:
apply(do.call('cbind', lapply(dflist, `[`, 'value')), 1, myfun)

【讨论】：

【解决方案2】：

如果键可能按任何顺序排列，或者可能丢失，这是另一种解决方案：

n <- 10  # of csv files to create
obs <- 10  # of observations
# create test files
for (i in 1:n){
    df <- data.frame(A = sample(1:3, obs, TRUE)
                , B = sample(1:3, obs, TRUE)
                , C = sample(1:3, obs, TRUE)
                , value = runif(obs)
                )
    write.csv(df, file = tempfile(fileext = '.csv'), row.names = FALSE)
}


# read in the data
input <- lapply(list.files(tempdir(), "*.csv", full.names = TRUE)
    , function(file) read.csv(file)
    )

# put dataframe together and the compute the mean for each unique combination
# of A, B & C assuming that they could be in any order.
input <- do.call(rbind, input)
result <- lapply(split(input, list(input$A, input$B, input$C), drop = TRUE)
    , function(sect){
        sect$value[1L] <- mean(sect$value)
        sect[1L, ]
    }
)

# create output DF
result <- do.call(rbind, result)
result

【讨论】：