过滤具有两个不同标准的个人？答案

【问题标题】：Filter individual that has two different criteria?过滤具有两个不同标准的个人？
【发布时间】：2021-02-23 20:38:06
【问题描述】：

TLDR：需要通过两个不同的标准过滤个人

基本上给出下面的例子，我需要知道哪个人得到了奶酪和面包，并返回等于这个的行。在示例中，这些是 alibaba、mary 和 steve。

通常 dplyr 中的多个过滤条件非常简单，但这是针对不同的行，所以我发现它非常困难。我确实想出了一个很长的解决方案，但我确信有更有效的方法。

我正在处理一个大型数据集，因此速度至关重要。


set.seed(1111)
df = data.frame(ID = sample(c("bob","steve","mary","alibaba"),20,replace = TRUE))
                
set.seed(1311)                
df$food = sample(c("cheese","bread","olives"),20, replace = TRUE)

# finding which individuals have both cheese and bread
index = df %>% distinct(ID,food, .keep_all = TRUE) %>% 
  filter(food == "cheese" | food == "olives") %>% 
  group_by(ID) %>% 
  summarise(freq = n()) %>% 
  filter(freq > 1) %>% {as.vector(.$ID)}

# returning the rows for the individuals that have both cheese and bread
df %>% filter(ID %in% index,food == "cheese" | food == "olives")

【问题讨论】：

标签： r filter dplyr subset

【解决方案1】：

在按“ID”分组后，filter 那些同时具有“奶酪”和“橄榄”的组通过用 all 包装，同时使用第二个表达式 (food %in% c('cheese', 'olives')) 进行元素过滤

library(dplyr) 
df %>%
     group_by(ID) %>%
     filter(all(c('cheese', 'olives') %in% food), food %in% c('cheese', 'olives'))

-输出

# A tibble: 13 x 2
# Groups:   ID [3]
#   ID      food  
#   <chr>   <chr> 
# 1 alibaba olives
# 2 steve   olives
# 3 steve   olives
# 4 steve   olives
# 5 alibaba cheese
# 6 steve   olives
# 7 steve   olives
# 8 mary    cheese
# 9 alibaba olives
#10 mary    olives
#11 steve   cheese
#12 alibaba olives
#13 steve   olives

或者另一个可能更快的选择是首先 filter 然后进行分组并过滤那些在“食物”中具有 2 个不同值的组

df %>%
     filter(food %in% c('cheese', 'olives')) %>% 
     group_by(ID) %>%
     filter(n_distinct(food) == 2)

或者data.table的另一个选项

library(data.table)
i1 <- setDT(df)[, .I[all(c('cheese', 'olives') %in% food) & food %in% c('cheese', 'olives')], ID]$V1
df[i1]

【讨论】：

我觉得它看起来不错....你能解释一下吗？
@Dasr 我更新了帖子。希望能帮助到你。如果你能告诉我问题出在哪里，我可以帮你
一切正常。我只是想理解你给出的第一个答案。但它正在工作。谢谢。