选择包含特定值的行的组（使用 dplyr 和管道）答案

【问题标题】：Select groups with row containing specific value (with dplyr and pipes)选择包含特定值的行的组（使用 dplyr 和管道）
【发布时间】：2020-12-22 17:43:40
【问题描述】：

我正在尝试在分组的 df 中选择包含每个组内特定行上的特定字符串的组。

考虑以下df：

df <- data.frame(id = c(rep("id_1", 4),
                        rep("id_2", 4),
                        rep("id_3", 4)),
                 string = c("here",
                            "is", 
                            "some",
                            "text",
                            "here",
                            "is",
                            "other",
                            "text",
                            "there",
                            "are",
                            "final",
                            "texts"))

我想创建一个数据框，其中包含仅在第二行包含“is”一词的组。

这里有一些不正确的代码：

desired_df <- df %>% group_by(id) %>% 
        filter(slice(select(., string), 2) %in% "is")

这是所需的输出：

desired_df <- data.frame(id = c(rep("id_1", 4),
                                      rep("id_2", 4)),
                               string = c("here",
                                          "is", 
                                          "some",
                                          "text",
                                          "here",
                                          "is",
                                          "other",
                                          "text"))

我查看了here，但这并不能解决我的问题，因为它会找到 any 出现指定字符串的组。

我还可以做一些单独的代码来识别 id，然后使用它来子集原始 df，如下所示：

ids <- df %>% group_by(id) %>% slice(2) %>% filter(string %in% "is") %>% select(id)
desired_df <- df %>% filter(id %in% ids$id)

但我想知道是否可以在单个管道系列中做一些更简单的事情。

帮助表示赞赏！

【问题讨论】：

标签： r filter dplyr group-by

【解决方案1】：

在按“id”分组后，将第二个元素的“字符串”作为子集，并在 %in% 的 lhs 上应用 %in% 和“is”，以每组返回一个 TRUE

library(dplyr)
df %>%
    group_by(id) %>% 
    filter('is' %in% string[2]) %>%
    ungroup

-输出

# A tibble: 8 x 2
#  id    string
#  <chr> <chr> 
#1 id_1  here  
#2 id_1  is    
#3 id_1  some  
#4 id_1  text  
#5 id_2  here  
#6 id_2  is    
#7 id_2  other 
#8 id_2  text

【讨论】：

不完全清楚 OP 是否需要第二行包含 only 字符串“is”，但如果是这样，可以使用df %>% group_by(id) %>% filter(str_detect(string[2], "^is$")) 稍微收紧约束。
@andrew_reece 基于 OP 的代码 %in% "is"，在我看来它是固定匹配
如果 string == "is" 不匹配 OP 的 string %in% 'is'？ vs 你的'is' %in% string[2] 也可以匹配“不是”？
@andrew_reece %in% 和 == 都在检查固定匹配而不是子字符串