根据另一列的最大值有条件地过滤组的元素 (dplyr::group_by)答案

【问题标题】：Conditionally filter elements of a group based on max value of another column (dplyr::group_by)根据另一列的最大值有条件地过滤组的元素 (dplyr::group_by)
【发布时间】：2020-05-01 08:18:59
【问题描述】：

df <- read.csv("http://www.sharecsv.com/dl/da89d0f973c81ad8c0ff4bcb0e7293b0/testdata.csv")
df %>% dplyr::group_by(TOF)

我想查看重复的 TOF 值。每当找到重复值（换句话说，TOF 属于同一 dplyr::group 的值）时，我想保留那些满足以下条件的值：

intFT > max(intFT) * 0.1 ### this condition is valid within-group, i.e. max(intFT) refers to the highest intFT in a certain TOF group grouped by dplyr::group_by

此外，在每个TOF 组中，只应保留最高的三个元素 intFT。

NA 值应该不被删除。

这会返回一个不正确的解决方案：

df %>% dplyr::group_by(TOF) %>% filter(intFT > max(intFT) * 0.1)

【问题讨论】：

我无法重现它看看 mtcars %>% group_by(cyl) %>%filter(mpg > max(mpg) * .9)

标签： r filter dplyr conditional-statements subset

【解决方案1】：

我没有你的数据，但这样的东西可以工作

df %>%
  dplyr::group_by(TOF) %>% 
  add_tally %>% 
  mutate(remove_it = if_else(n > 2 | intFT < max(intFT) * 0.1),"yes","no") %>% 
  filter(remove_it == "no") %>% 
  top_n(3)

【讨论】：