【问题标题】:Combine consecutive date ranges合并连续的日期范围
【发布时间】:2013-03-24 21:08:34
【问题描述】:

使用 SQL Server 2008 R2,

鉴于一个结束日期紧邻下一个开始日期,我正在尝试将日期范围合并到最大日期范围中。

这些数据是关于不同职业的。一些员工可能已经结束了他们的工作并在以后重新加入。这些应该算作两种不同的工作(例如 ID 5)。有些人有不同类型的工作,彼此竞争(结束日期和开始日期并列),在这种情况下,它应该被视为总共一项工作(例如 ID 30)。

未结束的雇佣期的结束日期为空。

一些例子可能很有启发性:

declare @t as table  (employmentid int, startdate datetime, enddate datetime)

insert into @t values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null)

-- expected outcome
EmploymentId StartDate   EndDate
5            2007-12-03  2011-08-26
5            2013-05-02  NULL
30           2006-10-02  NULL
66           2007-09-24  NULL

我一直在尝试不同的“孤岛和间隙”技术,但无法破解这一技术。

【问题讨论】:

  • 不应该startDate == endDate 进行适当的重叠吗?否则会有 24 小时下落不明。
  • 这将是存储过程,是吗?还是您受查询限制?
  • @MaxH:实际上,日期时间用作日期。所以重叠是可以的。
  • @JonasLincoln:是的,我明白这一点,但如果你要计算员工受雇的天数,你会得到不同的结果。在上面的示例中,employeeid 30 已经工作了 1567 + 573 + 234 = 2374 天(null = 2013-04-04 = 今天)。这与employeeid 30(从2006-10-02 到2013-04-04 的2376 天)的摘要不同。每次改变就业类型,您将缩短 1 天时间。

标签: sql tsql sql-server-2008-r2


【解决方案1】:

我使用日期“31211231”时看到的奇怪之处只是处理“无结束日期”情况的一个非常大的日期。我假设您实际上不会有很多日期范围每位员工,所以我使用了一个简单的递归公用表表达式来组合这些范围。

为了使其运行得更快,起始锚点查询仅保留那些链接到先前范围(每个员工)的日期。其余的只是遍历日期范围并扩大范围。最后的 GROUP BY 仅保留每个起始 ANCHOR(employmentid、startdate)组合建立的最大日期范围。


SQL Fiddle

MS SQL Server 2008 架构设置

create table Tbl (
  employmentid int,
  startdate datetime,
  enddate datetime);

insert Tbl values
(5, '2007-12-03', '2011-08-26'),
(5, '2013-05-02', null),
(30, '2006-10-02', '2011-01-16'),
(30, '2011-01-17', '2012-08-12'),
(30, '2012-08-13', null),
(66, '2007-09-24', null);

/*
-- expected outcome
EmploymentId StartDate   EndDate
5            2007-12-03  2011-08-26
5            2013-05-02  NULL
30           2006-10-02  NULL
66           2007-09-24  NULL
*/

查询 1

;with cte as (
   select a.employmentid, a.startdate, a.enddate
     from Tbl a
left join Tbl b on a.employmentid=b.employmentid and a.startdate-1=b.enddate
    where b.employmentid is null
    union all
   select a.employmentid, a.startdate, b.enddate
     from cte a
     join Tbl b on a.employmentid=b.employmentid and b.startdate-1=a.enddate
)
   select employmentid,
          startdate,
          nullif(max(isnull(enddate,'32121231')),'32121231') enddate
     from cte
 group by employmentid, startdate
 order by employmentid

Results

| EMPLOYMENTID |                        STARTDATE |                       ENDDATE |
-----------------------------------------------------------------------------------
|            5 |  December, 03 2007 00:00:00+0000 | August, 26 2011 00:00:00+0000 |
|            5 |       May, 02 2013 00:00:00+0000 |                        (null) |
|           30 |   October, 02 2006 00:00:00+0000 |                        (null) |
|           66 | September, 24 2007 00:00:00+0000 |                        (null) |

【讨论】:

  • 六年后,这仍然是小型约会团体的绝佳解决方案。谢谢!
  • cte 中的第一个投影不应该是 ;with cte as ( select a.employmentid, b.startdate, a.enddate 。 b.startdate 而不是 a.startdate?
【解决方案2】:

用于合并所有重叠时段的修改脚本。
例如
01.01.2001-01.01.2010
05.05.2005-05.05.2015

会给出一个句号:
01.01.2001-05.05.2015

tbl.enddate 必须填写

;WITH cte
  AS(
SELECT
  a.employmentid
  ,a.startdate
  ,a.enddate
from tbl a
left join tbl c on a.employmentid=c.employmentid
    and a.startdate > c.startdate
    and a.startdate <= dateadd(day, 1, c.enddate)
WHERE c.employmentid IS NULL

UNION all

SELECT
  a.employmentid
  ,a.startdate
  ,a.enddate
from cte a
inner join tbl c on a.startdate=c.startdate
    and (c.startdate = dateadd(day, 1, a.enddate) or (c.enddate > a.enddate and c.startdate <= a.enddate))
)
select distinct employmentid,
          startdate,
          nullif(max(enddate),'31.12.2099') enddate
from cte
group by employmentid, startdate

【讨论】:

    【解决方案3】:
    SET NOCOUNT ON
    
    DECLARE @T TABLE(ID INT,FromDate DATETIME, ToDate DATETIME)
    
    INSERT INTO @T(ID,FromDate,ToDate)
    SELECT 1,'20090801','20090803' UNION ALL
    SELECT 2,'20090802','20090809' UNION ALL
    SELECT 3,'20090805','20090806' UNION ALL
    SELECT 4,'20090812','20090813' UNION ALL
    SELECT 5,'20090811','20090812' UNION ALL
    SELECT 6,'20090802','20090802'
    
    
    SELECT ROW_NUMBER() OVER(ORDER BY s1.FromDate) AS ID,
           s1.FromDate, 
           MIN(t1.ToDate) AS ToDate 
    FROM @T s1 
    INNER JOIN @T t1 ON s1.FromDate <= t1.ToDate 
      AND NOT EXISTS(SELECT * FROM @T t2 
                     WHERE t1.ToDate >= t2.FromDate
                       AND t1.ToDate < t2.ToDate) 
    WHERE NOT EXISTS(SELECT * FROM @T s2 
                     WHERE s1.FromDate > s2.FromDate
                       AND s1.FromDate <= s2.ToDate) 
    GROUP BY s1.FromDate 
    ORDER BY s1.FromDate
    

    【讨论】:

    • 不要提供简单的代码,而是尝试解释思维过程以使所有寻找答案的人受益。
    • 看起来逻辑是这样的:所有范围合并后,一组合并范围中的第一个范围的开始日期不在任何其他范围内,而一组中的最后一个范围有结束日期不在其他范围内。查询查找所有第一个范围(s1)并找到相应的最后一个范围(MIN(t1.ToDate) 对应于在s1 之后结束的最早的最后一个范围)。 EXISTS 条件将 s1 限制为第一个范围,将 t1 限制为最后一个范围。
    【解决方案4】:

    使用窗口函数而不是递归 CTE 的替代解决方案

    SELECT 
        employmentid, 
        MIN(startdate) as startdate, 
        NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
    FROM (
        SELECT 
            employmentid, 
            startdate, 
            enddate,
            DATEADD(
                DAY, 
                -COALESCE(
                    SUM(DATEDIFF(DAY, startdate, enddate)+1) OVER (PARTITION BY employmentid ORDER BY startdate ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 
                    0
                ),
                startdate
        ) as grp
        FROM @t
    ) withGroup
    GROUP BY employmentid, grp
    ORDER BY employmentid, startdate
    

    这通过计算所有连续行都相同的grp 值来工作。这是通过以下方式实现的:

    1. 确定 span 占用的总天数(+1,因为包括日期在内)
    SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
    
    1. 按开始日期排序的每个工作的总天数。这为我们提供了所有先前工作跨度的总天数
      • 我们与 0 合并以确保在我们的累积天数总和中没有 NULL
      • 我们不将当前行包含在我们的累积总和中,这是因为我们将使用该值来对抗 startdate 而不是 enddate(由于 NULL,我们不能将其用于 enddate
    SELECT *, COALESCE(
        SUM(daysSpanned) OVER (
            PARTITION BY employmentid 
            ORDER BY startdate 
            ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
        )
        ,0
    )  as cumulativeDaysSpanned
    FROM (
        SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
    ) inner1
    
    1. startdate 中减去累计天数,得到我们的grp。这是解决方案的关键。
      • 如果开始日期的增长速度与所跨越的天数相同,则天数是连续的,减去这两个天数将得到相同的值。
      • 如果 startdate 的增长速度快于所跨越的天数,则存在差距,我们将获得一个大于前一个值的新 grp 值。
      • 虽然grp 是一个日期,但日期本身是没有意义的,我们只是将其用作分组值
    SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
    FROM (
        SELECT *, COALESCE(
            SUM(daysSpanned) OVER (
                PARTITION BY employmentid 
                ORDER BY startdate 
                ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
            )
            ,0
        )  as cumulativeDaysSpanned
        FROM (
            SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
        ) inner1
    ) inner2
    

    有了结果

    +--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
    | employmentid | startdate               | enddate                 | daysSpanned | cumulativeDaysSpanned | grp                     |
    +--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
    | 5            | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 | 1363        | 0                     | 2007-12-03 00:00:00.000 |
    +--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
    | 5            | 2013-05-02 00:00:00.000 | NULL                    | NULL        | 1363                  | 2009-08-08 00:00:00.000 |
    +--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
    | 30           | 2006-10-02 00:00:00.000 | 2011-01-16 00:00:00.000 | 1568        | 0                     | 2006-10-02 00:00:00.000 |
    +--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
    | 30           | 2011-01-17 00:00:00.000 | 2012-08-12 00:00:00.000 | 574         | 1568                  | 2006-10-02 00:00:00.000 |
    +--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
    | 30           | 2012-08-13 00:00:00.000 | NULL                    | NULL        | 2142                  | 2006-10-02 00:00:00.000 |
    +--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
    | 66           | 2007-09-24 00:00:00.000 | NULL                    | NULL        | 0                     | 2007-09-24 00:00:00.000 |
    +--------------+-------------------------+-------------------------+-------------+-----------------------+-------------------------+
    
    1. 终于可以GROUP BY grp摆脱连续几天了。
      • 使用MINMAX 获取新的startdateendate
      • 为了处理 NULL enddate,我们给它们一个很大的值,以便被 MAX 拾取,然后再次将它们转换回 NULL
    SELECT 
        employmentid, 
        MIN(startdate) as startdate, 
        NULLIF(MAX(COALESCE(enddate,'9999-01-01')), '9999-01-01') as enddate
    FROM (
        SELECT *, DATEADD(DAY, -cumulativeDaysSpanned, startdate) as grp
        FROM (
            SELECT *, COALESCE(
                SUM(daysSpanned) OVER (
                    PARTITION BY employmentid 
                    ORDER BY startdate 
                    ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
                )
                ,0
            )  as cumulativeDaysSpanned
            FROM (
                SELECT *, DATEDIFF(DAY, startdate, enddate)+1 as daysSpanned FROM @t
            ) inner1
        ) inner2
    ) inner3
    GROUP BY employmentid, grp
    ORDER BY employmentid, startdate
    

    为了得到想要的结果

    +--------------+-------------------------+-------------------------+
    | employmentid | startdate               | enddate                 |
    +--------------+-------------------------+-------------------------+
    | 5            | 2007-12-03 00:00:00.000 | 2011-08-26 00:00:00.000 |
    +--------------+-------------------------+-------------------------+
    | 5            | 2013-05-02 00:00:00.000 | NULL                    |
    +--------------+-------------------------+-------------------------+
    | 30           | 2006-10-02 00:00:00.000 | NULL                    |
    +--------------+-------------------------+-------------------------+
    | 66           | 2007-09-24 00:00:00.000 | NULL                    |
    +--------------+-------------------------+-------------------------+
    
    1. 我们可以结合内部查询来获取此答案开头的查询。哪个更短,但更难解释

    所有这一切的限制要求

    • 就业的开始日期和结束日期没有重叠。这可能会在我们的grp 中产生冲突。
    • 开始日期不为空。然而,这可以通过用小日期值替换 NULL 开始日期来克服
    • 未来的开发者可以破译你执行的窗口黑魔法

    【讨论】:

      猜你喜欢
      • 2017-02-12
      • 1970-01-01
      • 1970-01-01
      • 2014-12-16
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多