【发布时间】:2021-02-23 20:38:06
【问题描述】:
TLDR:需要通过两个不同的标准过滤个人
基本上给出下面的例子,我需要知道哪个人得到了奶酪和面包,并返回等于这个的行。在示例中,这些是 alibaba、mary 和 steve。
通常 dplyr 中的多个过滤条件非常简单,但这是针对不同的行,所以我发现它非常困难。我确实想出了一个很长的解决方案,但我确信有更有效的方法。
我正在处理一个大型数据集,因此速度至关重要。
set.seed(1111)
df = data.frame(ID = sample(c("bob","steve","mary","alibaba"),20,replace = TRUE))
set.seed(1311)
df$food = sample(c("cheese","bread","olives"),20, replace = TRUE)
# finding which individuals have both cheese and bread
index = df %>% distinct(ID,food, .keep_all = TRUE) %>%
filter(food == "cheese" | food == "olives") %>%
group_by(ID) %>%
summarise(freq = n()) %>%
filter(freq > 1) %>% {as.vector(.$ID)}
# returning the rows for the individuals that have both cheese and bread
df %>% filter(ID %in% index,food == "cheese" | food == "olives")
【问题讨论】: