选择 R 中均值最高的组答案

【问题标题】：Select group with highest mean in R选择 R 中均值最高的组
【发布时间】：2020-07-17 10:15:07
【问题描述】：

我一直在处理这个问题，但仍然没有找到最佳解决方案。希望你能帮助我。

我有一个包含几列的数据集，我需要对其进行过滤。例如，假设我有以下内容：

DB <- data.frame(A = c(rep("GeneA",6), rep("GeneB",6)), B = c("one", "one", "two", "two", "three", "three", "one", "one", "two", "two", "three", "three"), C = c(1,2,5,4,8,5,8,7,4,5,1,8))

我想要的是 A 列中每个值的过滤数据集，B 列中具有最高均值的组。

在这种情况下，所需的输出将是：

DB <- data.frame(A = c("GeneA","GeneA","GeneB","GeneB"), B = c("three", "three", "one", "one"), C = c(8,5,8,7))

在网上搜索只发现了可以过滤每个组的最高值的情况，我需要该组的每一行。

与：

result <- DB %>%
  group_by(A,B) %>%
  summarize(c = mean(C))

我只是获得手段。我也试过用aggegate等，但没办法。而且我确信有一种简单的方法，可能使用 data.table。

【问题讨论】：

标签： r group-by

【解决方案1】：

使用subset + ave 的基本 R 选项

DBout <- subset(DB,!ave(ave(C,A,B),A,FUN = function(x) x != max(x)))

这样

> DBout
      A     B C
5 GeneA three 8
6 GeneA three 5
7 GeneB   one 8
8 GeneB   one 7

【讨论】：

谢谢托马斯，可爱的回答。 ave() 函数之前从未使用过

【解决方案2】：

不确定这是否是您要查找的内容。使用dplyr：

DB %>%
  group_by(A,B) %>%
  mutate(D = mean(C)) %>%
  group_by(A) %>%
  filter(D==max(D)) %>%
  select(-D)

# A tibble: 4 x 3
# Groups:   A [2]
  A     B         C
  <chr> <chr> <dbl>
1 GeneA three     8
2 GeneA three     5
3 GeneB one       8
4 GeneB one       7

【讨论】：

【解决方案3】：

使用基础 R 和 dplyr 的组合：

library(dplyr)

DB %>% group_by(A) %>% filter(B == names(which.max(tapply(C, B, mean))))

#    A     B       C
#  <chr> <chr> <dbl>
#1 GeneA three     8
#2 GeneA three     5
#3 GeneB one       8
#4 GeneB one       7

对于A 中的每个组，我们在filter 行中为B 选择mean 的最大值。

【讨论】：