【问题标题】:Select rows with no date range overlap选择没有日期范围重叠的行
【发布时间】:2013-10-21 11:23:25
【问题描述】:

想象以下Loans 表:

BorrowerID       StartDate         DueDate
=============================================
1                2012-09-02        2012-10-01
2                2012-10-05        2012-10-21
3                2012-11-07        2012-11-09
4                2012-12-01        2013-01-01
4                2012-12-01        2013-01-14
1                2012-12-20        2013-01-06
3                2013-01-07        2013-01-22
3                2013-01-15        2013-01-18
1                2013-02-20        2013-02-24

我将如何从一次只借过一笔贷款的人中选择不同的BorrowerIDs?这包括只借过一笔贷款的借款人,以及已经借过不止一笔贷款的借款人,前提是如果你要画出他们的贷款时间线,他们中的任何一个都不会重叠。例如,在上表中,它应该只找到借款人 1 和 2。

我尝试过将表连接到自身,但还没有真正成功。任何指针都非常感谢!

【问题讨论】:

    标签: sql sql-server date-range


    【解决方案1】:

    如果您使用的是 SQL 2012,则可以这样做:

    with cte as (
    select 
        BorrowerID, 
        StartDate, 
        DueDate,
        lag(DueDate) over (partition by borrowerid order by StartDate, DueDate) as PrevDueDate
    from test
    )
    
    select 
        distinct BorrowerID 
    from cte
    where BorrowerID not in
        (select BorrowerID 
        from cte 
        where StartDate <= PrevDueDate)
    

    【讨论】:

      【解决方案2】:

      试试

      with cte as 
      (
          select *, 
            row_number() over (partition by b order by s) r
            from loans
       )
      
      select l1.b
      from loans l1
      except
      select c1.b
      from cte c1
      where exists (
          select 1
          from cte c2 
          where c2.b = c1.b
          and c2.r <> c1.r
          and (c2.s between c1.s and c1.e
                   or c1.s between c2.s and c2.e)
       )
      

      【讨论】:

        【解决方案3】:

        使用主键的 dbo.Loan 解决方案

        要解决此问题,您需要以下SQL Fiddle 中详述的两步方法。我确实在您的示例数据中添加了一个 LoanId 列,并且查询要求存在这样的唯一 ID。如果没有,则需要调整 join 子句以确保贷款不会与其自身匹配。

        MS SQL Server 2008 架构设置

        CREATE TABLE dbo.Loans
            (LoanID INT, [BorrowerID] int, [StartDate] datetime, [DueDate] datetime)
        GO
        
        INSERT INTO dbo.Loans
            (LoanID, [BorrowerID], [StartDate], [DueDate])
        VALUES
            (1, 1, '2012-09-02 00:00:00', '2012-10-01 00:00:00'),
            (2, 2, '2012-10-05 00:00:00', '2012-10-21 00:00:00'),
            (3, 3, '2012-11-07 00:00:00', '2012-11-09 00:00:00'),
            (4, 4, '2012-12-01 00:00:00', '2013-01-01 00:00:00'),
            (5, 4, '2012-12-01 00:00:00', '2013-01-14 00:00:00'),
            (6, 1, '2012-12-20 00:00:00', '2013-01-06 00:00:00'),
            (7, 3, '2013-01-07 00:00:00', '2013-01-22 00:00:00'),
            (8, 3, '2013-01-15 00:00:00', '2013-01-18 00:00:00'),
            (9, 1, '2013-02-20 00:00:00', '2013-02-24 00:00:00')
        GO
        

        首先,您需要找出哪些贷款与另一笔贷款重叠。该查询使用&lt;= 比较开始日期和截止日期。这将第二笔在同一天开始的贷款计算为重叠。如果您需要它们不重叠,请在两个地方使用&lt;

        查询 1

        SELECT 
           *,
           CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                             WHERE L2.BorrowerID = L1.BorrowerID
                               AND L2.LoanID <> L1.LoanID
                               AND L1.StartDate <= L2.DueDate
                               AND L2.StartDate <= l1.DueDate) 
                THEN 1
                ELSE 0
           END AS HasOverlappingLoan
          FROM dbo.Loans L1;
        

        Results

        | LOANID | BORROWERID |                        STARTDATE |                         DUEDATE | HASOVERLAPPINGLOAN |
        |--------|------------|----------------------------------|---------------------------------|--------------------|
        |      1 |          1 | September, 02 2012 00:00:00+0000 |  October, 01 2012 00:00:00+0000 |                  0 |
        |      2 |          2 |   October, 05 2012 00:00:00+0000 |  October, 21 2012 00:00:00+0000 |                  0 |
        |      3 |          3 |  November, 07 2012 00:00:00+0000 | November, 09 2012 00:00:00+0000 |                  0 |
        |      4 |          4 |  December, 01 2012 00:00:00+0000 |  January, 01 2013 00:00:00+0000 |                  1 |
        |      5 |          4 |  December, 01 2012 00:00:00+0000 |  January, 14 2013 00:00:00+0000 |                  1 |
        |      6 |          1 |  December, 20 2012 00:00:00+0000 |  January, 06 2013 00:00:00+0000 |                  0 |
        |      7 |          3 |   January, 07 2013 00:00:00+0000 |  January, 22 2013 00:00:00+0000 |                  1 |
        |      8 |          3 |   January, 15 2013 00:00:00+0000 |  January, 18 2013 00:00:00+0000 |                  1 |
        |      9 |          1 |  February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 |                  0 |
        

        现在,利用这些信息,您可以通过此查询确定没有重叠贷款的借款人:

        查询 2

        WITH OverlappingLoans AS (
          SELECT 
           *,
           CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                             WHERE L2.BorrowerID = L1.BorrowerID
                               AND L2.LoanID <> L1.LoanID
                               AND L1.StartDate <= L2.DueDate
                               AND L2.StartDate <= l1.DueDate) 
                THEN 1
                ELSE 0
           END AS HasOverlappingLoan
          FROM dbo.Loans L1
        ),
        OverlappingBorrower AS (
          SELECT BorrowerID, MAX(HasOverlappingLoan) HasOverlappingLoan
            FROM OverlappingLoans
           GROUP BY BorrowerID
        )
        SELECT * 
          FROM OverlappingBorrower
         WHERE hasOverlappingLoan = 0;
        

        或者,您甚至可以通过计算贷款以及计算数据库中每个借款人与其他贷款重叠的贷款数量来获得更多信息。 (注意,如果贷款A和贷款B重叠,则本次查询都将被视为重叠贷款)

        Results

        | BORROWERID | HASOVERLAPPINGLOAN |
        |------------|--------------------|
        |          1 |                  0 |
        |          2 |                  0 |
        

        查询 3

        WITH OverlappingLoans AS (
          SELECT 
           *,
           CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                             WHERE L2.BorrowerID = L1.BorrowerID
                               AND L2.LoanID <> L1.LoanID
                               AND L1.StartDate <= L2.DueDate
                               AND L2.StartDate <= l1.DueDate) 
                THEN 1
                ELSE 0
           END AS HasOverlappingLoan
          FROM dbo.Loans L1
        )
        SELECT BorrowerID,COUNT(1) LoanCount, SUM(hasOverlappingLoan) OverlappingCount
          FROM OverlappingLoans
         GROUP BY BorrowerID;
        

        Results

        | BORROWERID | LOANCOUNT | OVERLAPPINGCOUNT |
        |------------|-----------|------------------|
        |          1 |         3 |                0 |
        |          2 |         1 |                0 |
        |          3 |         3 |                2 |
        |          4 |         2 |                2 |
        


        无主键的 dbo.Loan 解决方案

        更新:由于要求实际上需要一个不依赖于每笔贷款的唯一标识符的解决方案,因此我进行了以下更改:

        1)我添加了一个借款人,该借款人有两笔开始和到期日期相同的贷款

        SQL Fiddle

        MS SQL Server 2008 架构设置

        CREATE TABLE dbo.Loans
            ([BorrowerID] int, [StartDate] datetime, [DueDate] datetime)
        GO
        
        INSERT INTO dbo.Loans
            ([BorrowerID], [StartDate], [DueDate])
        VALUES
            ( 1, '2012-09-02 00:00:00', '2012-10-01 00:00:00'),
            ( 2, '2012-10-05 00:00:00', '2012-10-21 00:00:00'),
            ( 3, '2012-11-07 00:00:00', '2012-11-09 00:00:00'),
            ( 4, '2012-12-01 00:00:00', '2013-01-01 00:00:00'),
            ( 4, '2012-12-01 00:00:00', '2013-01-14 00:00:00'),
            ( 1, '2012-12-20 00:00:00', '2013-01-06 00:00:00'),
            ( 3, '2013-01-07 00:00:00', '2013-01-22 00:00:00'),
            ( 3, '2013-01-15 00:00:00', '2013-01-18 00:00:00'),
            ( 1, '2013-02-20 00:00:00', '2013-02-24 00:00:00'),
            ( 5, '2013-02-20 00:00:00', '2013-02-24 00:00:00'),
            ( 5, '2013-02-20 00:00:00', '2013-02-24 00:00:00')
        GO
        

        2)那些“等日期”贷款需要一个额外的步骤:

        查询 1

        SELECT BorrowerID, StartDate, DueDate, COUNT(1) LoanCount
          FROM dbo.Loans
         GROUP BY BorrowerID, StartDate, DueDate;
        

        Results

        | BORROWERID |                        STARTDATE |                         DUEDATE | LOANCOUNT |
        |------------|----------------------------------|---------------------------------|-----------|
        |          1 | September, 02 2012 00:00:00+0000 |  October, 01 2012 00:00:00+0000 |         1 |
        |          1 |  December, 20 2012 00:00:00+0000 |  January, 06 2013 00:00:00+0000 |         1 |
        |          1 |  February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 |         1 |
        |          2 |   October, 05 2012 00:00:00+0000 |  October, 21 2012 00:00:00+0000 |         1 |
        |          3 |  November, 07 2012 00:00:00+0000 | November, 09 2012 00:00:00+0000 |         1 |
        |          3 |   January, 07 2013 00:00:00+0000 |  January, 22 2013 00:00:00+0000 |         1 |
        |          3 |   January, 15 2013 00:00:00+0000 |  January, 18 2013 00:00:00+0000 |         1 |
        |          4 |  December, 01 2012 00:00:00+0000 |  January, 01 2013 00:00:00+0000 |         1 |
        |          4 |  December, 01 2012 00:00:00+0000 |  January, 14 2013 00:00:00+0000 |         1 |
        |          5 |  February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 |         2 |
        

        3) 现在,每个贷款范围都是独一无二的,我们可以再次使用旧技术。但是,我们还需要考虑那些“等日期”贷款。 (L1.StartDate &lt;&gt; L2.StartDate OR L1.DueDate &lt;&gt; L2.DueDate) 防止贷款与自身匹配。 OR LoanCount &gt; 1 代表“等日期”贷款。

        查询 2

        WITH NormalizedLoans AS (
          SELECT BorrowerID, StartDate, DueDate, COUNT(1) LoanCount
            FROM dbo.Loans
           GROUP BY BorrowerID, StartDate, DueDate  
        )
        SELECT 
           *,
           CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                             WHERE L2.BorrowerID = L1.BorrowerID
                               AND L1.StartDate <= L2.DueDate
                               AND L2.StartDate <= l1.DueDate
                               AND (L1.StartDate <> L2.StartDate
                                    OR L1.DueDate <> L2.DueDate)
                           ) 
                     OR LoanCount > 1
                THEN 1
                ELSE 0
           END AS HasOverlappingLoan
          FROM NormalizedLoans L1;
        

        Results

        | BORROWERID |                        STARTDATE |                         DUEDATE | LOANCOUNT | HASOVERLAPPINGLOAN |
        |------------|----------------------------------|---------------------------------|-----------|--------------------|
        |          1 | September, 02 2012 00:00:00+0000 |  October, 01 2012 00:00:00+0000 |         1 |                  0 |
        |          1 |  December, 20 2012 00:00:00+0000 |  January, 06 2013 00:00:00+0000 |         1 |                  0 |
        |          1 |  February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 |         1 |                  0 |
        |          2 |   October, 05 2012 00:00:00+0000 |  October, 21 2012 00:00:00+0000 |         1 |                  0 |
        |          3 |  November, 07 2012 00:00:00+0000 | November, 09 2012 00:00:00+0000 |         1 |                  0 |
        |          3 |   January, 07 2013 00:00:00+0000 |  January, 22 2013 00:00:00+0000 |         1 |                  1 |
        |          3 |   January, 15 2013 00:00:00+0000 |  January, 18 2013 00:00:00+0000 |         1 |                  1 |
        |          4 |  December, 01 2012 00:00:00+0000 |  January, 01 2013 00:00:00+0000 |         1 |                  1 |
        |          4 |  December, 01 2012 00:00:00+0000 |  January, 14 2013 00:00:00+0000 |         1 |                  1 |
        |          5 |  February, 20 2013 00:00:00+0000 | February, 24 2013 00:00:00+0000 |         2 |                  1 |
        

        这个查询逻辑没有改变(除了换掉开头)。

        查询 3

        WITH NormalizedLoans AS (
          SELECT BorrowerID, StartDate, DueDate, COUNT(1) LoanCount
            FROM dbo.Loans
           GROUP BY BorrowerID, StartDate, DueDate  
        ),
        OverlappingLoans AS (
        SELECT 
           *,
           CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                             WHERE L2.BorrowerID = L1.BorrowerID
                               AND L1.StartDate <= L2.DueDate
                               AND L2.StartDate <= l1.DueDate
                               AND (L1.StartDate <> L2.StartDate
                                    OR L1.DueDate <> L2.DueDate)
                           ) 
                     OR LoanCount > 1
                THEN 1
                ELSE 0
           END AS HasOverlappingLoan
          FROM NormalizedLoans L1
        ),
        OverlappingBorrower AS (
          SELECT BorrowerID, MAX(HasOverlappingLoan) HasOverlappingLoan
            FROM OverlappingLoans
           GROUP BY BorrowerID
        )
        SELECT * 
          FROM OverlappingBorrower
         WHERE hasOverlappingLoan = 0;
        

        Results

        | BORROWERID | HASOVERLAPPINGLOAN |
        |------------|--------------------|
        |          1 |                  0 |
        |          2 |                  0 |
        

        4) 在这个计数查询中,我们需要再次合并“相同日期”贷款计数。为此,我们使用SUM(LoanCount) 而不是普通的COUNT。我们还必须将 hasOverlappingLoan 与 LoanCount 相乘才能再次获得正确的重叠计数。

        查询 4

        WITH NormalizedLoans AS (
          SELECT BorrowerID, StartDate, DueDate, COUNT(1) LoanCount
            FROM dbo.Loans
           GROUP BY BorrowerID, StartDate, DueDate  
        ),
        OverlappingLoans AS (
        SELECT 
           *,
           CASE WHEN EXISTS(SELECT 1 FROM dbo.Loans L2 
                             WHERE L2.BorrowerID = L1.BorrowerID
                               AND L1.StartDate <= L2.DueDate
                               AND L2.StartDate <= l1.DueDate
                               AND (L1.StartDate <> L2.StartDate
                                    OR L1.DueDate <> L2.DueDate)
                           ) 
                     OR LoanCount > 1
                THEN 1
                ELSE 0
           END AS HasOverlappingLoan
          FROM NormalizedLoans L1
        )
        SELECT BorrowerID,SUM(LoanCount) LoanCount, SUM(hasOverlappingLoan*LoanCount) OverlappingCount
          FROM OverlappingLoans
         GROUP BY BorrowerID;
        

        Results

        | BORROWERID | LOANCOUNT | OVERLAPPINGCOUNT |
        |------------|-----------|------------------|
        |          1 |         3 |                0 |
        |          2 |         1 |                0 |
        |          3 |         3 |                2 |
        |          4 |         2 |                2 |
        |          5 |         2 |                2 |
        

        我强烈建议找到一种方法来使用我的第一个解决方案,因为没有主键的贷款表是一种“奇数”设计。但是,如果您确实无法到达那里,请使用第二种解决方案。

        【讨论】:

        • 感谢您的努力。唯一的问题是不幸的是没有LoanID 专栏——我得到的只有这 3 个。
        • 您能否为同一个借款人提供多笔贷款,开始和结束日期相同?
        • 可能 - 如果同一个人同时借 2 件相同的时间长度的物品。
        【解决方案4】:

        我让它工作了,但有点复杂。它首先在内部查询中获取不符合条件的借款人,然后返回其余的借款人。内部查询有两部分:

        获取并非在同一天开始的所有重叠借款。

        获取从同一日期开始的所有借款。

        select distinct BorrowerID from borrowings
        where BorrowerID NOT IN
        
        (
            select b1.BorrowerID from borrowings b1
            inner join borrowings b2
                on b1.BorrowerID = b2.BorrowerID
                and b1.StartDate < b2.StartDate
                and b1.DueDate > b2.StartDate
        
            union 
        
            select BorrowerID from borrowings
            group by BorrowerID, StartDate
            having count(*) > 1
        )
        

        我必须使用 2 个单独的内部查询,因为您的表没有每个记录的唯一标识符,并且使用 b1.StartDate &lt;= b2.StartDate,因为我应该将记录连接到自身。最好为每条记录设置一个单独的标识符。

        【讨论】:

        • 看起来很合理 - 感谢您抽出宝贵时间回答!我会再打开一段时间,看看是否有其他想法出现......
        猜你喜欢
        • 2018-03-03
        • 2020-04-24
        • 2022-01-25
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2013-01-18
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多