【问题标题】:Filter data frame by multiple criteria from different data frame通过来自不同数据帧的多个条件过滤数据帧
【发布时间】:2016-08-29 18:23:10
【问题描述】:

我想根据不同数据帧 ('key') 中的多个值过滤一个数据帧 ('data')。

我的'key' 看起来像这样

exhibit.name  <- c("lions", "otters", "penguins")
exhibit.start <- c(as.Date("2016-04-01"), as.Date("2016-05-01"), as.Date("2016-06-01"))
exhibit.end   <- c(as.Date("2016-04-30"), as.Date("2016-05-31"), as.Date("2016-06-30"))
key           <- data_frame(exhibit.name, exhibit.start, exhibit.end)

而我的'data' 看起来像这样

exhibit.name <- c("lions", "lions", "otters", 
                  "otters", "penguins", "penguins")
exhibit.date <- c(as.Date("2016-04-15"), as.Date("2016-12-15"), as.Date("2016-05-15"),
                  as.Date("2016-02-15"), as.Date("2016-06-15"), as.Date("2016-10-15"))
data         <- data_frame(exhibit.name, exhibit.date)

我需要过滤 'data' 以返回 data$exhibit.name 匹配 key$exhibit.name 并且其 data$exhibit.date 属于相关 key$exhibit.startkey$exhibit.end 日期的行。生成的数据框如下所示:

> valid.exhibits
1|lions   |2016-04-15
2|otters  |2016-05-15
3|penguins|2016-06-15

谢谢!

【问题讨论】:

  • 仅供参考,你可以as.Date(c(chars, chars, chars))
  • dplyr中有一个between函数

标签: r dplyr


【解决方案1】:

我们可以先left_join 然后filter

data %>% 
   left_join(., key) %>%
   filter(exhibit.start < exhibit.date, exhibit.end  > exhibit.date)  %>% 
   select(1:2)
#     exhibit.name exhibit.date
#         <chr>       <date>
#1        lions   2016-04-15
#2       otters   2016-05-15
#3     penguins   2016-06-15

我们也可以使用 non-equi (data.table 开发版的条件连接),即 v1.9.7+

library(data.table)
setDT(key)
setDT(data)[key, on = .(exhibit.name, exhibit.date > exhibit.start, 
          exhibit.date < exhibit.end), new := 1]
na.omit(data)[, new := NULL][]
#   exhibit.name exhibit.date
#1:        lions   2016-04-15
#2:       otters   2016-05-15
#3:     penguins   2016-06-15

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2019-12-04
    • 1970-01-01
    • 2018-08-24
    • 1970-01-01
    • 1970-01-01
    • 2022-10-14
    • 1970-01-01
    • 2018-07-23
    相关资源
    最近更新 更多