R - 我如何根据一列中的值总结其他列

【问题标题】：R - how can I summarize other columns based on the value in one columnR - 我如何根据一列中的值总结其他列
【发布时间】：2014-12-30 16:41:05
【问题描述】：

我有一个文件，前几行是：

                  bacttaxa LL8388  UL8388  LL8384  LL8381  UL8382  LL8385
13603   Yokenella regensburgei      0   0.000   0.000   0.000   0.000  76.192
15068   Yokenella regensburgei      0   0.000   0.000 399.583   0.000   0.000
11518 Zobellia galactanivorans      0  83.133 200.795  79.862  90.273  29.303
19706 Zobellia galactanivorans      0 327.694   0.000 605.251 214.366 453.391
608      Zunongwangia profunda      0   0.000   0.000   0.000   0.000  96.438
3159     Zunongwangia profunda      0  14.865  23.004  28.628  11.166  53.613

如何根据第一列中的相同值获得其他列的总和，以便获得每个细菌分类的总和？任何的想法？谢谢！

【问题讨论】：

标签： r matrix sum

【解决方案1】：

正如 cmets 中所述，这是一个“聚合”问题。因此，一个明显的选择是基础 R 中的 aggregate 函数：

aggregate(. ~ bacttaxa, x, sum)
#                   bacttaxa LL8388  UL8388  LL8384  LL8381  UL8382  LL8385
# 1   Yokenella regensburgei      0   0.000   0.000 399.583   0.000  76.192
# 2 Zobellia galactanivorans      0 410.827 200.795 685.113 304.639 482.694
# 3    Zunongwangia profunda      0  14.865  23.004  28.628  11.166 150.051

您还可以探索“data.table”和“dplyr”包。

## A data.table approach
library(data.table)
as.data.table(x)[, lapply(.SD, sum), by = bacttaxa]

## A dplyr approach
library(dplyr)
x %>% 
  group_by(bacttaxa) %>%
  summarise_each(funs(sum))

【讨论】：

谢谢！我刚刚弄清楚如何使用聚合。我将研究如何使用 dplyr 包。非常感谢！