【问题标题】:How to merge two tables by date range and ID?如何按日期范围和 ID 合并两个表?
【发布时间】:2016-10-21 00:46:27
【问题描述】:

我有一个个人特征表,例如:

person <- data.frame(group.id = c("N","N","P"), person.id = c("A", "B", "C"), strt = c(as.Date(x = "2002-07-01"), as.Date(x = "2003-08-01"), as.Date(x = "2004-06-23")), end = c(as.Date(x = "2003-08-01"), as.Date(x = "2004-09-01"), as.Date(x = "2006-07-01")), c = 1:3, d = 3:5)

以及一组组特征的组表,例如:

group <- data.frame(group.id = c("N", "N", "N", "O", "O", "O", "P", "P", "P"), report.date = c(as.Date(x = "2002-07-01"), as.Date(x = "2002-08-01"), as.Date(x = "2002-09-01")), a = c(1:3), b = c(4:6))

我想按 group.id 和适用的日期范围合并它们,例如:

group2 <- data.frame(group, person.id = c("A", "A", "A", NA, NA, NA, NA, NA, NA), strt = c(as.Date(x = "2002-07-01"), as.Date(x = "2002-07-01"), as.Date(x = "2002-07-01"), NA, NA, NA, NA, NA, NA), end = c(as.Date(x = "2003-08-01"), as.Date(x = "2003-08-01"), as.Date(x = "2003-08-01"), NA, NA, NA, NA, NA, NA), c = c(1, 1, 1, NA, NA, NA, NA, NA, NA), d = c(3, 3, 3, NA, NA, NA, NA, NA, NA))
  group.id report.date a b person.id       strt        end  c  d
1        N  2002-07-01 1 4         A 2002-07-01 2003-08-01  1  3
2        N  2002-08-01 2 5         A 2002-07-01 2003-08-01  1  3
3        N  2002-09-01 3 6         A 2002-07-01 2003-08-01  1  3
4        O  2002-07-01 1 4      <NA>       <NA>       <NA> NA NA
5        O  2002-08-01 2 5      <NA>       <NA>       <NA> NA NA
6        O  2002-09-01 3 6      <NA>       <NA>       <NA> NA NA
7        P  2002-07-01 1 4      <NA>       <NA>       <NA> NA NA
8        P  2002-08-01 2 5      <NA>       <NA>       <NA> NA NA
9        P  2002-09-01 3 6      <NA>       <NA>       <NA> NA NA

有没有人建议如何在 R 中做到这一点?

【问题讨论】:

  • 你可以使用data.tablethis post
  • HI @Hack-R ,我使用公共变量 group.id 合并 persongroup 并检查 group$report.date 是否介于 person$strt person$end 范围之间。在我的示例中,我保留了第 4 行及以下 NA 的值,因为它们在示例 person 表中没有对应的值。
  • @timothy.s.lau 谢谢,我刚刚根据该解释更新了我的答案。请看看它是否能解决您的问题。

标签: r date merge


【解决方案1】:
person <- data.frame(group_id = c("N","N","P"), person_id = c("A", "B", "C"), strt = c(as.Date(x = "2002-07-01"), as.Date(x = "2003-08-01"), as.Date(x = "2004-06-23")), end = c(as.Date(x = "2003-08-01"), as.Date(x = "2004-09-01"), as.Date(x = "2006-07-01")), c = 1:3, d = 3:5)

group <- data.frame(group_id = c("N", "N", "N", "O", "O", "O", "P", "P", "P"), report_date = c(as.Date(x = "2002-07-01"), as.Date(x = "2002-08-01"), as.Date(x = "2002-09-01")), a = c(1:3), b = c(4:6))

group2 <- data.frame(group, person_id = c("A", "A", "A", NA, NA, NA, NA, NA, NA), strt = c(as.Date(x = "2002-07-01"), as.Date(x = "2002-07-01"), as.Date(x = "2002-07-01"), NA, NA, NA, NA, NA, NA), end = c(as.Date(x = "2003-08-01"), as.Date(x = "2003-08-01"), as.Date(x = "2003-08-01"), NA, NA, NA, NA, NA, NA), c = c(1, 1, 1, NA, NA, NA, NA, NA, NA), d = c(3, 3, 3, NA, NA, NA, NA, NA, NA))


library(sqldf)


sqldf("select a.*, b.* from 'group' a left join person b on a.group_id = b.group_id and (a.report_date >= b.strt and a.report_date <= b.end)")
  group_id report_date a b group_id person_id       strt        end  c  d
1        N  2002-07-01 1 4        N         A 2002-07-01 2003-08-01  1  3
2        N  2002-08-01 2 5        N         A 2002-07-01 2003-08-01  1  3
3        N  2002-09-01 3 6        N         A 2002-07-01 2003-08-01  1  3
4        O  2002-07-01 1 4     <NA>      <NA>       <NA>       <NA> NA NA
5        O  2002-08-01 2 5     <NA>      <NA>       <NA>       <NA> NA NA
6        O  2002-09-01 3 6     <NA>      <NA>       <NA>       <NA> NA NA
7        P  2002-07-01 1 4     <NA>      <NA>       <NA>       <NA> NA NA
8        P  2002-08-01 2 5     <NA>      <NA>       <NA>       <NA> NA NA
9        P  2002-09-01 3 6     <NA>      <NA>       <NA>       <NA> NA NA

请注意,group 是保留字,因此我必须将其放在单引号中才能将其用作表格。我还将列名中的.s 更改为_ 以避免出现问题,但您可以保留. 并引用所有列名。

【讨论】:

  • 组 id P 的报告日期不在 strt 和 end 之间
  • @timothy.s.lau 请在浏览器上点击刷新
  • 您好,在移动设备上。不过,当我回到办公室时,我会运行你的代码,谢谢
  • 这看起来不错。我一直在努力弄清楚如何在基础 R 中做到这一点。这可能吗?
  • 请注意,SQL 也接受这个:a.report_date between b.strt and b.end
猜你喜欢
  • 2015-09-08
  • 2017-12-07
  • 2018-08-11
  • 1970-01-01
  • 2018-12-19
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多