过滤掉两个值之间的行答案

【问题标题】：Filter out rows between two values过滤掉两个值之间的行
【发布时间】：2020-07-15 12:17:03
【问题描述】：

我在过滤掉某些行时遇到了问题。

样本数据集：

df <- data.frame(id = c("1", "1", "1", "2", "2", "2", "3", "3"), description= c("Start", "Something", "Final", "Start", "Some Other Thing", "Final", "Start", "Final"), timestamp = c("2017-07-26 23:41:16", "2017-07-27 20:23:16", "2017-07-29 07:06:53", "2017-07-24 04:53:02", "2017-07-25 10:27:02", "2017-07-26 16:51:43", "2017-07-13 08:33:05")))

现在我想删除在 description = "Start" 和 description ="Final" 之间不存在其他值的所有组。这应该为每个 id 组完成。在此示例中，它将是 ID 为 3 的组。

任何帮助将不胜感激。提前致谢！

【问题讨论】：

你能用filter(df, description %in% c("Start", "Final")吗？
很遗憾没有。我想我可能有点不精确。我想要过滤掉“开始”和“最终”之间没有任何内容的组，但我想要介于“开始”、“最终”和其他描述之间的组。我编辑了描述和示例。很抱歉造成混乱！

标签： r filter dplyr

【解决方案1】：

如果我们将timestamp 转换为datetime，那么我们可以对数据进行排序并使用cumsum 来做你想做的事情（我认为）。

library(dplyr)
library(lubridate)

df %>%
  mutate(timestamp = lubridate::as_datetime(timestamp)) %>%
  group_by(id) %>%
  arrange(id, timestamp) %>%
  mutate(tracker = cumsum(description %in% c("Start", "Final"))) %>%
  filter((tracker %% 2 == 1) & description != "Start")
#> # A tibble: 2 x 4
#> # Groups:   id [2]
#>   id    description      timestamp           tracker
#>   <fct> <fct>            <dttm>                <int>
#> 1 1     Something        2017-07-27 20:23:16       1
#> 2 2     Some Other Thing 2017-07-25 10:27:02       1

【讨论】：

我的预期结果将是示例 df 中 ID 为 1 和 2 的所有行。

【解决方案2】：

因此，以下可能是您的问题的一种解决方案。

Test = df %>% aggregate(description~id, data=., FUN=function(x) c(count=length(x)))
Test$id = as.factor(Test$id)
df = inner_join(df, Test, by = "id")
df = df[df$description.y > 2, ]

这个想法是通过inner_join过滤掉所有只有两个描述（开始，最终）的组。输出是

> df
  id    description.x           timestamp description.y
1  1            Start 2017-07-26 23:41:16             3
2  1        Something 2017-07-27 20:23:16             3
3  1            Final 2017-07-29 07:06:53             3
4  2            Start 2017-07-24 04:53:02             3
5  2 Some Other Thing 2017-07-25 10:27:02             3
6  2            Final 2017-07-26 16:51:43             3

这就是你的想法吗？

【讨论】：

【解决方案3】：

另一种解决方案

library(tidyverse)
df %>% 
  group_by(id) %>% 
  mutate(n = n()) %>% 
  filter(n != 2)

【讨论】：