【问题标题】:Find all integer gaps in SQL查找 SQL 中的所有整数间隙
【发布时间】:2019-04-27 09:19:57
【问题描述】:

我有一个数据库,用于存储有关我从外部来源提取的游戏的不同匹配的信息。由于一些问题,数据库中偶尔会出现缺口(可能是 1 个缺失 ID 到几百个)。我想让程序提取丢失游戏的数据,但我需要先获取该列表。

这是表格的格式:

id (pk-identity)  |  GameID (int)  |  etc.  |  etc.  

我曾想过编写一个程序来运行一个循环并从 1 开始查询每个 GameID,但似乎应该有一种更有效的方法来获取丢失的数字。

有没有一种简单有效的方法,使用 SQL Server 从范围中查找所有缺失的数字?

【问题讨论】:

  • 你知道身份字段总是会有差距,对吧?如果一条记录被删除或初始插入被回滚,您将有一个间隙。
  • 是的,我了解身份列的工作原理,如果不清楚,我指的是 GameID 列。

标签: sql sql-server sql-server-2008


【解决方案1】:

我们的想法是查看差距从哪里开始。让我假设您使用的是 SQL Server 2012,因此有 lag()lead() 函数。以下获取下一个id

select t.*, lead(id) over (order by id) as nextid
from t;

如果有差距,那么nextid <> id+1。您现在可以使用where 来描述差距:

select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*, lead(id) over (order by id) as nextid
      from t
     ) t
where nextid <> id+1;

编辑:

如果没有lead(),我会对相关子查询做同样的事情:

select id+1 as FirstMissingId, nextid - 1 as LastMissingId
from (select t.*,
             (select top 1 id
              from t t2
              where t2.id > t.id
              order by t2.id
             ) as nextid
      from t
     ) t
where nextid <> id+1;

假设id 是表上的主键(或者甚至它只有一个索引),这两种方法都应该具有合理的性能。

【讨论】:

    【解决方案2】:

    Numbers table!

    CREATE TABLE dbo.numbers (
       number int NOT NULL
    )
    
    ALTER TABLE dbo.numbers
    ADD
       CONSTRAINT pk_numbers PRIMARY KEY CLUSTERED (number)
         WITH FILLFACTOR = 100
    GO
    
    INSERT INTO dbo.numbers (number)
    SELECT (a.number * 256) + b.number As number
    FROM     (
            SELECT number
            FROM   master..spt_values
            WHERE  type = 'P'
            AND    number <= 255
           ) As a
     CROSS
      JOIN (
            SELECT number
            FROM   master..spt_values
            WHERE  type = 'P'
            AND    number <= 255
           ) As b
    GO
    

    然后您可以在两个表之间执行OUTER JOIN 或 EXISTS` 并找到差距...

    SELECT *
    FROM   dbo.numbers
    WHERE  NOT EXISTS (
             SELECT *
             FROM   your_table
             WHERE  id = numbers.number
           )
    
    -- OR
    
    SELECT *
    FROM   dbo.numbers
     LEFT
      JOIN your_table
        ON your_table.id = numbers.number
    WHERE  your_table.id IS NULL
    

    【讨论】:

      【解决方案3】:

      我喜欢“差距和岛屿”的方法。它有点像这样:

      WITH Islands AS (
          SELECT GameId, GameID - ROW_NUMBER() OVER (ORDER BY GameID) AS [IslandID]
          FROM dbo.yourTable
      )
      SELECT MIN(GameID), MAX(Game_id)
      FROM Islands
      GROUP BY IslandID
      

      该查询将为您提供连续范围的列表。从那里,您可以自行加入该结果集(在连续的 IslandID 上)以获取差距。不过,要让 IslandID 本身是连续的,还有一些工作要做。因此,扩展上述查询:

      WITH 
      cte1 AS (
          SELECT GameId, GameId - ROW_NUMBER() OVER (ORDER BY GameId) AS [rn]
          FROM dbo.yourTable
      )
      , cte2 AS (
          SELECT [rn], MIN(GameId) AS [Start], MAX(GameId) AS [End]
          FROM cte1
          GROUP BY [rn]
      )
      ,Islands AS (
          SELECT ROW_NUMBER() OVER (ORDER BY [rn]) AS IslandId, [Start], [End]
        from cte2
      )
      
      SELECT a.[End] + 1 AS [GapStart], b.[Start] - 1 AS [GapEnd]
      FROM Islands AS a
      LEFT JOIN Islands AS b
          ON a.IslandID + 1 = b.IslandID
      

      【讨论】:

        【解决方案4】:
             SELECT * FROM #tab1
                    id          col1
                    ----------- --------------------
                    1           a
                    2           a
                    3           a
                    8           a
                    9           a
                    10          a
                    11          a
                    15          a
                    16          a
                    17          a
                    18          a
        
         WITH cte (id,nextId) as
                        (SELECT t.id, (SELECT TOP 1 t1.id FROM #tab1 t1 WHERE t1.id > t.id) AS nextId  FROM #tab1 t)
        
         SELECT id AS 'GapStart', nextId AS 'GapEnd' FROM cte
                        WHERE id + 1 <> nextId
        
            GapStart    GapEnd
            ----------- -----------
            3           8
            11          15
        

        【讨论】:

          【解决方案5】:

          试试这个(这涵盖了从 1 开始的多达 10000 个 ID,如果您需要更多,可以在下面的 Numbers 表中添加更多):

          ;WITH Digits AS (
              select Digit 
              from ( values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) as t(Digit))
          ,Numbers AS (
              select u.Digit 
                    + t.Digit*10 
                    + h.Digit*100 
                    + th.Digit*1000
                    + tth.Digit*10000 
                    --Add 10000, 100000 multipliers if required here.
                    as myId
              from Digits u
              cross join Digits t
              cross join Digits h
              cross join Digits th
              cross join Digits tth
              --Add the cross join for higher numbers 
              )
          SELECT myId 
          FROM Numbers
          WHERE myId NOT IN (SELECT GameId FROM YourTable)
          

          【讨论】:

            【解决方案6】:

            问题:我们需要在id字段中找到间隙范围

            SELECT * FROM #tab1
            
            id          col1
            ----------- --------------------
            1           a  
            2           a  
            3           a  
            8           a    
            9           a  
            10          a  
            11          a  
            15          a  
            16          a  
            17          a  
            18          a
            

            解决方案

            WITH cte (id,nextId) as
            (SELECT t.id, (SELECT TOP 1 t1.id FROM #tab1 t1 WHERE t1.id > t.id) AS nextId  FROM #tab1 t)
            
            SELECT id + 1, nextId - 1 FROM cte
            WHERE id + 1 <> nextId
            

            输出

            GapStart    GapEnd
            ----------- -----------
            4           7
            12          14
            

            【讨论】:

              猜你喜欢
              • 1970-01-01
              • 2021-08-03
              • 1970-01-01
              • 2021-01-27
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              • 1970-01-01
              相关资源
              最近更新 更多