【问题标题】:sqldf query with date difference带有日期差异的sqldf查询
【发布时间】:2017-01-14 11:13:41
【问题描述】:

我正在尝试对以下两个数据库(摘录)上的颜色和日期差异执行(简化!)查询:

A                           B   
    A.COL   A.TIME              B.COL   B.TIME
1   blue    2009-01-31      1   blue    2007-01-31
2   blue    2009-02-28      2   blue    2008-12-31
3   blue    2009-03-31      3   blue    2009-02-28
4   blue    2009-04-30      4   blue    2009-04-30
5   blue    2009-05-31      5   blue    2009-06-30
6   blue    2009-06-30      6   blue    2016-08-31
7   blue    2016-03-31
8   blue    2016-04-30
9   red ...
10  red ...

我想要做的:根据 COL 以及 TIME 的差异合并表,即两个时间之间的差异不能大于或小于 2 个月(或者换句话说,介于 -2 和 + 2,取决于从哪个日期开始)。

# For example starting with observation 1 from A, that would imply 2 matches:
2009-01-31 matched to 2008-12-31 (diff = 1)
2009-01-31 matched to 2009-02-28  (diff = -1)

# for obs 2 from A, that would imply 
2009-02-28 matched to 2008-12-31 (diff = 2)
2009-02-28 matched to 2009-02-28 (diff = 0)
2009-02-28 matched to 2009-04-30 (diff = -2)

等等。 我正在考虑某种日期差异函数,要么来自lubridate,这在少于 30 天的月份的情况下是有问题的,有时是 NA,或者来自zooas.yearmon,它可以正确计算至少差异。但是,我无法正确地将其实现为sqldf(错误:语句中的错误:接近“as”:语法错误)。原因似乎是不能使用带有 sqldf 的每个 R 函数。 任何想法如何在 R 中完成?我也在寻找一种优雅的方式来减少彼此的月份。 lubridate 存在这个问题: Add/subtract 6 months (bond time) in R using lubridate,但这是一种建议的方法,如何使用 zoo 完成它:Get the difference between dates in terms of weeks, months, quarters, and years

获取数据(感谢@bouncyball 下面的代码):

A <- read.table(
  text = "
  A.COL   A.TIME          
  blue    2009-01-31     
  blue    2009-02-28      
  blue    2009-03-31      
  blue    2009-04-30      
  blue    2009-05-31      
  blue    2009-06-30
  blue    2016-03-31
  blue    2016-04-30
  ", header = T, stringsAsFactors = FALSE)


B <- read.table(
  text = "
  B.COL   B.TIME
  blue    2007-01-31
  blue    2008-12-31
  blue    2009-02-28
  blue    2009-04-30
  blue    2009-06-30
  blue    2016-08-31
  ", stringsAsFactors = FALSE, header = T)

【问题讨论】:

    标签: r date merge difference sqldf


    【解决方案1】:

    这是一个使用 this SO postplyr 包中的函数的解决方案:

    library(plyr)
    
    # turn a date into a 'monthnumber' relative to an origin
    monnb <- function(d) { 
      lt <- as.POSIXlt(as.Date(d, origin="1900-01-01"))
      lt$year*12 + lt$mon 
      } 
    
    # compute a month difference as a difference between two monnb's
    mondf <- function(d1, d2) { monnb(d2) - monnb(d1) }
    
    # iterate over rows of A looking for matches in B
    adply(A, 1, function(x)
      B[x$A.COL == B$B.COL & 
          abs(mondf(as.Date(x$A.TIME), as.Date(B$B.TIME))) <= 2,]
    )
    
    #     A.COL    A.TIME  B.COL    B.TIME
    # 1   blue 2009-01-31  blue 2008-12-31
    # 2   blue 2009-01-31  blue 2009-02-28
    # 3   blue 2009-02-28  blue 2008-12-31
    # 4   blue 2009-02-28  blue 2009-02-28
    # 5   blue 2009-02-28  blue 2009-04-30
    #  ....
    

    编辑:data.table 实现

    library(data.table)
    merge_AB <- data.table(merge(A,B, by.x = 'A.COL', by.y = 'B.COL'))
    
    merge_AB[,DateDiff := abs(mondf(A.TIME, B.TIME))
           ][DateDiff <= 2]
    
     #     A.COL     A.TIME     B.TIME DateDiff
     # 1:  blue 2009-01-31 2008-12-31        1
     # 2:  blue 2009-01-31 2009-02-28        1
     # 3:  blue 2009-02-28 2008-12-31        2
     # 4:  blue 2009-02-28 2009-02-28        0
     # 5:  blue 2009-02-28 2009-04-30        2
     # ...
    

    数据

    A <- read.table(
    text = "
    A.COL   A.TIME          
    blue    2009-01-31     
    blue    2009-02-28      
    blue    2009-03-31      
    blue    2009-04-30      
    blue    2009-05-31      
    blue    2009-06-30
    blue    2016-03-31
    blue    2016-04-30
    ", header = T, stringsAsFactors = FALSE)
    
    
    B <- read.table(
      text = "
    B.COL   B.TIME
    blue    2007-01-31
    blue    2008-12-31
    blue    2009-02-28
    blue    2009-04-30
    blue    2009-06-30
    blue    2016-08-31
    ", stringsAsFactors = FALSE, header = T)
    

    【讨论】:

    • 干得好!只有一个问题:datatable 也可以做到这一点吗?我对plyr不熟悉,而且我刚开始使用datatable,所以想知道在这里是否可以使用它来实现相同的排序?在旁注中,您没有使用sqldf 来解决这个问题。我想知道:在这种情况下不可能吗?
    • @user3032689 是的,我们可以,请参阅我对data.table 实施的编辑
    猜你喜欢
    • 1970-01-01
    • 2016-08-09
    • 1970-01-01
    • 2015-12-08
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-11-01
    相关资源
    最近更新 更多