【发布时间】:2017-01-14 11:13:41
【问题描述】:
我正在尝试对以下两个数据库(摘录)上的颜色和日期差异执行(简化!)查询:
A B
A.COL A.TIME B.COL B.TIME
1 blue 2009-01-31 1 blue 2007-01-31
2 blue 2009-02-28 2 blue 2008-12-31
3 blue 2009-03-31 3 blue 2009-02-28
4 blue 2009-04-30 4 blue 2009-04-30
5 blue 2009-05-31 5 blue 2009-06-30
6 blue 2009-06-30 6 blue 2016-08-31
7 blue 2016-03-31
8 blue 2016-04-30
9 red ...
10 red ...
我想要做的:根据 COL 以及 TIME 的差异合并表,即两个时间之间的差异不能大于或小于 2 个月(或者换句话说,介于 -2 和 + 2,取决于从哪个日期开始)。
# For example starting with observation 1 from A, that would imply 2 matches:
2009-01-31 matched to 2008-12-31 (diff = 1)
2009-01-31 matched to 2009-02-28 (diff = -1)
# for obs 2 from A, that would imply
2009-02-28 matched to 2008-12-31 (diff = 2)
2009-02-28 matched to 2009-02-28 (diff = 0)
2009-02-28 matched to 2009-04-30 (diff = -2)
等等。
我正在考虑某种日期差异函数,要么来自lubridate,这在少于 30 天的月份的情况下是有问题的,有时是 NA,或者来自zoo 的as.yearmon,它可以正确计算至少差异。但是,我无法正确地将其实现为sqldf(错误:语句中的错误:接近“as”:语法错误)。原因似乎是不能使用带有 sqldf 的每个 R 函数。
任何想法如何在 R 中完成?我也在寻找一种优雅的方式来减少彼此的月份。 lubridate 存在这个问题:
Add/subtract 6 months (bond time) in R using lubridate,但这是一种建议的方法,如何使用 zoo 完成它:Get the difference between dates in terms of weeks, months, quarters, and years
获取数据(感谢@bouncyball 下面的代码):
A <- read.table(
text = "
A.COL A.TIME
blue 2009-01-31
blue 2009-02-28
blue 2009-03-31
blue 2009-04-30
blue 2009-05-31
blue 2009-06-30
blue 2016-03-31
blue 2016-04-30
", header = T, stringsAsFactors = FALSE)
B <- read.table(
text = "
B.COL B.TIME
blue 2007-01-31
blue 2008-12-31
blue 2009-02-28
blue 2009-04-30
blue 2009-06-30
blue 2016-08-31
", stringsAsFactors = FALSE, header = T)
【问题讨论】:
标签: r date merge difference sqldf