【问题标题】:Find common date range from a set of overlapping date ranges从一组重叠的日期范围中查找共同的日期范围
【发布时间】:2019-01-17 16:45:34
【问题描述】:

如何从一组给定的日期范围中找到重叠(共同)的日期范围?

考虑到特定程序 (PID) 的所有事件 (EID),需要找到重叠(常见)日期范围。

示例:程序 (PID=13579) 有两个事件日期范围 (EID=2)。

之前发表于link

我已经在这里检查过(但没用):Link

示例架构和数据:

CREATE TABLE #EventsTBL
(
    PID INT,
    EID INT,
    StartDate DATETIME,
    EndDate DATETIME
);

INSERT INTO #EventsTBL
VALUES
(13579, '1', '01 Jan 2018', '31 Mar 2019'),
(13579, '2', '01 Feb 2018', '31 May 2018'),
(13579, '2', '01 Jul 2018', '31 Jan 2019'),
(13579, '7', '01 Mar 2018', '31 Mar 2019'),
(13579, '5', '01 Feb 2018', '30 Apr 2018'),
(13579, '5', '01 Oct 2018', '31 Mar 2019'),
(13579, '8', '01 Jan 2018', '30 Apr 2018'),
(13579, '8', '01 Jun 2018', '31 Dec 2018'),
(13579, '13', '01 Jan 2018', '31 Mar 2019'),
(13579, '6', '01 Apr 2018', '31 May 2018'),
(13579, '6', '01 Sep 2018', '30 Nov 2018'),
(13579, '4', '01 Feb 2018', '31 Jan 2019'),
(13579, '19', '01 Mar 2018', '31 Jul 2018'),
(13579, '19', '01 Oct 2018', '28 Feb 2019'),
--
(13570, '16', '01 Feb 2018', '30 Jun 2018'),
(13570, '16', '01 Aug 2018', '31 Aug 2018'),
(13570, '16', '01 Oct 2018', '28 Feb 2019'),
(13570, '23', '01 Mar 2018', '30 Jun 2018'),
(13570, '23', '01 Nov 2018', '31 Jan 2019');

输出应该是:

PID     StartDate       EndDate
13579   01-Apr-2018     30-Apr-2018
13579   01-Oct-2018     30-Nov-2018
13570   01-Mar-2018     30-Jun-2018
13570   01-Nov-2018     31-Jan-2019

【问题讨论】:

  • 将预期结果发布为文本总是更好:) 不过你的第一个问题做得很好!为什么马丁的回答不够?
  • 我看不出确定它们是否重叠的规则是什么。许多日期范围相互重叠,但突出显示的日期范围与更多但不是所有情况重叠 - 那么它是如何决定的?
  • 所以一个日期必须在 PID 的所有 EID 中至少出现一次?
  • 13570 有道理,我看到了重叠 - 13579 充满了未报告的重叠
  • 需要找到特定 ProgramID (PID) 的所有 EventsID (EID) 的日期范围重叠(常见)。!例如,第 2 行没有从 oct'18 到 nov'18 的范围,但相同的 eventid(eid=2) 有另一个范围(第 3 行)从 jul'18 到 jan'19,其中包括 oct'18 , 18 年 11 月。所以在这里,第 2 行和第 3 行属于 eid(eventid) = 2。这意味着事件 2 的共同日期范围是 oct'18 和 nov'18。希望这很清楚。

标签: sql sql-server overlap gaps-and-islands


【解决方案1】:

此答案计算重叠间隔的数量。它假定具有相同 EID 的日期范围不重叠。以下是内联解释的查询:

DECLARE @EventsTBL TABLE (PID INT, EID INT, StartDate DATETIME, EndDate DATETIME);
INSERT INTO @EventsTBL VALUES
(13579, 1,  '01 Jan 2018', '31 Mar 2019'),
(13579, 2,  '01 Feb 2018', '31 May 2018'),
(13579, 2,  '01 Jul 2018', '31 Jan 2019'),
(13579, 7,  '01 Mar 2018', '31 Mar 2019'),
(13579, 5,  '01 Feb 2018', '30 Apr 2018'),
(13579, 5,  '01 Oct 2018', '31 Mar 2019'),
(13579, 8,  '01 Jan 2018', '30 Apr 2018'),
(13579, 8,  '01 Jun 2018', '31 Dec 2018'),
(13579, 13, '01 Jan 2018', '31 Mar 2019'),
(13579, 6,  '01 Apr 2018', '31 May 2018'),
(13579, 6,  '01 Sep 2018', '30 Nov 2018'),
(13579, 4,  '01 Feb 2018', '31 Jan 2019'),
(13579, 19, '01 Mar 2018', '31 Jul 2018'),
(13579, 19, '01 Oct 2018', '28 Feb 2019'),
(13570, 16, '01 Feb 2018', '30 Jun 2018'),
(13570, 16, '01 Aug 2018', '31 Aug 2018'),
(13570, 16, '01 Oct 2018', '28 Feb 2019'),
(13570, 23, '01 Mar 2018', '30 Jun 2018'),
(13570, 23, '01 Nov 2018', '31 Jan 2019');

WITH cte1 AS (
    /*
     * augment the data with the number of distinct EID per PID
     * we will need this later
     */
    SELECT e.PID, a.EIDCount, StartDate, EndDate
    FROM @EventsTBL AS e
    JOIN (
        SELECT PID, COUNT(DISTINCT EID) AS EIDCount
        FROM @EventsTBL
        GROUP BY PID
    ) AS a ON e.PID = a.PID
), cte2 AS (
    /*
     * build a list of "points in time" at which an event started or ended
     * and the number concurrent events changed
     * the zero value rows are required!
     */
    SELECT PID, EIDCount, StartDate AS pdate, 1 AS pval
    FROM cte1
    UNION ALL
    SELECT PID, EIDCount, EndDate, 0
    FROM cte1
    UNION ALL
    SELECT PID, EIDCount , DATEADD(DAY, 1, EndDate), -1
    FROM cte1
), cte3 AS (
    /*
     * calculate running sum of pval over dates; minus ones first
     */
    SELECT PID, EIDCount, pdate, SUM(pval) OVER (PARTITION BY PID ORDER BY pdate, pval) AS evtcount
    FROM cte2
), cte4 AS (
    /*
     * consolidate data for same dates and we are done with the main part
     */
    SELECT PID, EIDCount, pdate, MAX(evtcount) AS evtcount
    FROM cte3
    GROUP BY PID, EIDCount, pdate
), cte5 AS (
    /*
     * assign "change flag" to rows where number of concurrent events
     * enters or exits the required count w.r.t. previous row
     */
    SELECT PID, EIDCount, pdate, evtcount, CASE
        WHEN evtcount < EIDCount AND LAG(evtcount) OVER (PARTITION BY PID ORDER BY pdate) < EIDCount THEN 0
        WHEN evtcount = EIDCount AND LAG(evtcount) OVER (PARTITION BY PID ORDER BY pdate) = EIDCount THEN 0
        ELSE 1
    END AS chg
    FROM cte4
), cte6 AS (
    /*
     * convert "change flag" to "group numbers" over consecutive rows using running sum
     */
    SELECT PID, EIDCount, pdate, evtcount, SUM(chg) OVER (PARTITION BY PID ORDER BY pdate) AS grp
    FROM cte5
)
/*
 * group rows by pid and group numbers
 */
SELECT PID, MIN(pdate) AS StartDate, MAX(pdate) AS EndDate
FROM cte6
WHERE evtcount = EIDCount
GROUP BY PID, grp
ORDER BY PID, StartDate

Demo on db<>fiddle

【讨论】:

  • 使用 ResultSet 看起来不错,而且性能更好。谢谢萨尔曼。
【解决方案2】:

好的,所以我已经生成了一个包含所有正在考虑的日期的 CTE。

对于我认为我检测到重叠的每个日期都生成了 FLAG=1。

然后,我使用 row_number() 作为“岛屿”问题的标准解决方案,并且输出了 flag=1 的“岛屿”的开始和结束时间

我希望它有所帮助,我得到了 13570 的结果,但正如我所理解的那样,“重叠”整个 13579 重叠。也许那部分需要进一步的解释和适应。如果你能根据你的规则弄清楚如何生成 FLAG,排名部分仍然适用

CREATE TABLE #EventsTBL
(
    PID INT,
    EID INT,
    StartDate DATETIME,
    EndDate DATETIME
);

INSERT INTO #EventsTBL
VALUES
(13579, '1', '01 Jan 2018', '31 Mar 2019'),
(13579, '2', '01 Feb 2018', '31 May 2018'),
(13579, '2', '01 Jul 2018', '31 Jan 2019'),
(13579, '7', '01 Mar 2018', '31 Mar 2019'),
(13579, '5', '01 Feb 2018', '30 Apr 2018'),
(13579, '5', '01 Oct 2018', '31 Mar 2019'),
(13579, '8', '01 Jan 2018', '30 Apr 2018'),
(13579, '8', '01 Jun 2018', '31 Dec 2018'),
(13579, '13', '01 Jan 2018', '31 Mar 2019'),
(13579, '6', '01 Apr 2018', '31 May 2018'),
(13579, '6', '01 Sep 2018', '30 Nov 2018'),
(13579, '4', '01 Feb 2018', '31 Jan 2019'),
(13579, '19', '01 Mar 2018', '31 Jul 2018'),
(13579, '19', '01 Oct 2018', '28 Feb 2019'),
--
(13570, '16', '01 Feb 2018', '30 Jun 2018'),
(13570, '16', '01 Aug 2018', '31 Aug 2018'),
(13570, '16', '01 Oct 2018', '28 Feb 2019'),
(13570, '23', '01 Mar 2018', '30 Jun 2018'),
(13570, '23', '01 Nov 2018', '31 Jan 2019');


SELECT count(enddate) FROM (SELECT CAST('19660423' as date) dt) A LEFT JOIN #EventsTBL B ON A.dt = b.StartDate;

WITH MIN_MAX AS (SELECT MIN(StartDate) S , MAX(EndDate) E FROM #EventsTBL ),
     ALL_DATES AS (SELECT S DT FROM MIN_MAX
                    UNION ALL
                    SELECT DATEADD(day,1,DT) FROM ALL_DATES WHERE DT < (SELECT E FROM MIN_MAX)
                  ),
     BuildFlags AS (SELECT  P.pid,
                            DT,
                            COUNT(e.PID ) CNT, 
                            CASE WHEN COUNT(e.pid) > 1 THEN 1 ELSE 0 END FLAG, 
                            row_number() OVER(partition by p.pid order by DT) RN
                        FROM ALL_DATES A CROSS JOIN (SELECT DISTINCT E2.pid FROM #EventsTBL E2) P
                        LEFT JOIN 
                            #EventsTBL E ON P.PID = E.pid AND
                            A.DT BETWEEN E.StartDate AND E.EndDate GROUP BY P.pid,DT),
    AddRanks AS (SELECT *,rn - row_number()over(partition by pid,flag order by dt) groupRank  FROM BuildFlags)

     select pid,min(dt) as start, max(dt) as ending from AddRanks 
        where flag = 1
        group by pid,grouprank
        order by pid,min(dt)
     option(maxrecursion 0)

编辑 - 我想我已经明白你的意思了,你想将 pid 和 eid 组合成唯一的 pid 和 eid,以及那里的日期。然后,您将重叠定义为一次所有 pid 和 eid 都处于活动状态。所以我想出了这个修改

;WITH MIN_MAX AS (SELECT MIN(StartDate) S , MAX(EndDate) E FROM #EventsTBL ),
     ALL_DATES AS (SELECT S DT FROM MIN_MAX
                    UNION ALL
                    SELECT DATEADD(day,1,DT) FROM ALL_DATES WHERE DT < (SELECT E FROM MIN_MAX)
                  ),
     GROUPED AS (SELECT Q.pid,Q.eid,q.dt,case when max(tx.pid) is null then 0 else 1 end YES from (Select * FROM All_Dates cross join (select distinct pid,eid from #EventsTBL) AQ) Q
                                    LEFT JOIN  #EventsTBL TX ON TX.PID = Q.pid and tx.EID = Q.eid and 
                                                Q.DT BETWEEN TX.StartDate AND TX.EndDate GROUP BY q.pid,q.eid,q.dt
                ),                                       
     BuildFlags AS (SELECT g.pid,g.dt, row_number() OVER(partition by g.pid order by g.DT) RN,
          CASE WHEN WQ.tot = (SELECT count(distinct g2.eid)  FROM grouped g2 WHERE g2.PID = G.pid and g2.dt=g.dt and g2.yes=1) then 1 else 0 end FLAG
      FROM GROUPED G cross apply (select count(distinct E9.eid) tot FROM #EventsTBL E9 WHERE E9.PID = G.pid) WQ)
    ,AddRanks AS (SELECT *,rn - row_number()over(partition by pid,flag order by dt) groupRank  FROM BuildFlags)

     select pid,min(dt) as start, max(dt) as ending from AddRanks 
        where flag = 1
        group by pid,grouprank
        order by pid,min(dt)
     option(maxrecursion 0);

【讨论】:

  • ResultSet 看起来不错...感谢 Cato。
最近更新 更多