如果另一列包含特定的一组值，则在 R 中使用 dplyr 过滤列答案

【问题标题】：Filter a column if another column contains specific set of values using dplyr in R如果另一列包含特定的一组值，则在 R 中使用 dplyr 过滤列
【发布时间】：2017-11-13 04:07:58
【问题描述】：

在以下数据框中，我想过滤包含人“a”、“b”和“c”的组：

df <- structure(list(group = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4), 
person = structure(c(1L, 2L, 1L, 3L, 1L, 2L, 3L, 1L, 1L, 
2L, 3L, 4L), .Label = c("a", "b", "c", "e"), class = "factor")), .Names = 
c("group", 
"person"), row.names = c(NA, -12L), class = "data.frame")

【问题讨论】：

标签： r dplyr conditional

【解决方案1】：

我们可以使用data.table。将'data.frame'转换为'data.table'（setDT(df)），按'group'分组，我们通过检查all'a'、'b'、'c'元素是否为逻辑索引%in%‘person’获取Data.table的子集(.SD)

library(data.table)
setDT(df)[, .SD[all(c('a', 'b', 'c') %in% person)], group]

或dplyr，在按“人”分组后使用相同的方法

df %>%
   group_by(group) %>%
   filter(all(c('a', 'b', 'c') %in% person))

或base R

v1 <- rowSums(table(df)[, c('a', 'b', 'c')]>0)==3
subset(df, group %in% names(v1)[v1])

更新

如果我们只想使用dplyr返回2组

df %>% 
    group_by(group) %>%
    filter(all(c('a', 'b', 'c') %in% person), all(person %in% c('a', 'b', 'c')))

或者n_distinct

df %>%
   group_by(group) %>%
   filter(all(c('a', 'b', 'c') %in% person), n_distinct(person)==3)

或者data.table

setDT(df)[, .SD[all(c('a', 'b', 'c') %in% person) & uniqueN(person)==3], group]

【讨论】：

谢谢@akrun。我从来没有像这样在过滤器中使用过所有的东西，这很棒。如果我只想返回 a、b 和 c 而没有任何其他人（例如，组 #4 也按原样返回代码），我会过滤返回到 3 的记录数。有没有更优雅的方法？
@John 你是不是想返回df %>% group_by(group) %>% filter(any(!person %in% c('a', 'b', 'c')))