【问题标题】:SQL close close Gaps in data over timeSQL close close 随着时间的推移数据中的差距
【发布时间】:2012-02-17 04:07:37
【问题描述】:

我有一张用于制作原型的游戏数据表。我在工作时生成数据,但是当我离开并且我的机器进入睡眠状态时,数据生成停止。这导致我的收藏品出现了很大的空白。

我希望能够在表格的 DateTimeCreated 列中移动每个项目的值,以便任何项目与下一个生成的项目之间的间隔不会超过 10 分钟。

表格的结构是这样的:

CREATE TABLE [dbo].[Items](
    [Id] [uniqueidentifier] NOT NULL,
    [DateTimeCreated] [datetimeoffset](7) NOT NULL,
    [AuthorId] [uniqueidentifier] NOT NULL,
    [Source] [varchar](max) NOT NULL,       
    [FullText] [varchar](max) NOT NULL,
 CONSTRAINT [PK_Items] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

我正在考虑在 L2S 中执行此操作,但我有超过 100 万条记录,所以 IDK 如果这是最好的解决方案(迭代每个项目)。我知道在 SQL 中必须有一些方法可以更快地做到这一点。

【问题讨论】:

  • 经典的差距和岛屿问题。不要告诉那个人使用光标,不需要。
  • 光标 = 撒旦。如果没有人超过我,我今晚会发布一个非光标解决方案
  • @Dems - 我只有 10-15 个间隔,每个间隔约 12 小时。我想保持当前生成数据的日期时间的“随机性”。我从有机来源(随机推文、Facebook 帖子等)中提取数据,所以我想尽可能多地保留有机性。但是,最后,我对任何事情都持开放态度:)
  • @JCooper,谢谢!我期待您的帮助
  • @joe - 您现有的所有 ID 都是连续的吗?或者它们是否由于删除或插入失败而存在间隙?

标签: sql sql-server tsql sql-server-2012 gaps-and-islands


【解决方案1】:

另一种排名函数方法(未 100% 测试):

DECLARE @tenMinutes AS INT = 600;


WITH StartingPoints AS
(
    SELECT DateTimeCreated, ROW_NUMBER() OVER(ORDER BY DateTimeCreated) AS rownum
    FROM dbo.Items AS A
    WHERE NOT EXISTS(
        SELECT * FROM dbo.Items AS B
        WHERE B.DateTimeCreated < A.DateTimeCreated 
          AND DATEDIFF(SECOND,B.DateTimeCreated, A.DateTimeCreated) BETWEEN 0 AND @tenMinutes
    )
),
EndingPoints AS
(
    SELECT DateTimeCreated, ROW_NUMBER() OVER(ORDER BY DateTimeCreated) AS rownum
    FROM dbo.Items AS A
    WHERE NOT EXISTS(
        SELECT * FROM dbo.Items AS B
        WHERE A.DateTimeCreated < B.DateTimeCreated 
          AND DATEDIFF(SECOND,A.DateTimeCreated, B.DateTimeCreated) BETWEEN 0 AND @tenMinutes
    )
),
Islands AS
(
    SELECT S.DateTimeCreated AS start_range,
           E.DateTimeCreated AS end_range,
           ROW_NUMBER() OVER(ORDER BY S.DateTimeCreated) AS row_num
    FROM StartingPoints AS S
    JOIN EndingPoints AS E on E.rownum = S.rownum
),
Ofs AS
(
    SELECT I2.start_range, 
           I2.end_range,  
           I1.end_range AS prev,
           DATEDIFF(SECOND, I1.end_range, I2.start_range) AS offset 
    FROM Islands AS I1
    JOIN Islands AS I2 ON I2.row_num = I1.row_num + 1 OR I2.row_num IS NULL
),
CmlOfs AS
(
    SELECT O1.start_range,
           O1.end_range,
           O1.prev,
           O1.offset,
           (SELECT SUM(O2.offset) FROM Ofs AS O2
            WHERE O2.start_range <= O1.start_range) AS cum_offset
    FROM Ofs AS O1
),
UpdateQ AS
(
    SELECT Items.*, DATEADD(SECOND, -1 * CmlOfs.cum_offset, Items.DateTimeCreated) AS new_value
    FROM Items
    JOIN CmlOfs ON Items.DateTimeCreated BETWEEN CmlOfs.start_range AND CmlOfs.end_range
)
UPDATE UpdateQ
SET DateTimeCreated = new_value;

【讨论】:

  • 哇,这真的很棘手!您的解决方案组织良好且易于遵循!我现在正在尝试,我会告诉你进展如何!
  • @joe_coolish 我必须感谢作者 Itzik Ben-Gan。如果你想学习一些棘手的sql,就去看看他
【解决方案2】:

如果您希望它不是猪,请确保在 DateTimeCreated 上有一个索引。

它还假设(正如您在评论中所说)与记录总数相比,差距很小。

WITH
  gap (Start,Finish)
AS
(
  SELECT
    DateTimeCreated,
    (SELECT MIN(DateTimeCreated) FROM items AS lookup WHERE DateTimeCreated > DateTimeCreated)
  FROM
    items
  WHERE
    DATEADD(second, 600, DateTimeCreated) < (SELECT MIN(DateTimeCreated) FROM items AS lookup WHERE DateTimeCreated > DateTimeCreated)

  UNION ALL

  SELECT
    MAX(DateTimeCreated),
    MAX(DateTimeCreated)
  FROM
    items
)
,
  offset (Start,Finish,Offset)
AS
(
  SELECT
    [current].Start,
    (SELECT MIN(Start) FROM gap WHERE Start > [current].Start),
    DATEDIFF(second, Start, Finish) - 600
  FROM
    gap      AS [current]
)
,
  cumulative_offset (Start,Finish,Offset)
AS
(
  SELECT
    [current].Start,
    [current].Finish,
    SUM([cumulative].Offset)
  FROM
    offset    AS [current]
  INNER JOIN
    offset    AS [cumulative]
      ON [cumulative].Start <= [current].Start
)

UPDATE
  items
FROM
  cumulative_offset
SET
  DateTimeCreated = DATEADD(second, -Offset, DateTimeCreated)
INNER JOIN
  items
    ON  items.DateTimeCreated >  cumulative.Start
    AND items.DateTimeCreated <= cumulative.Finish

【讨论】:

  • 非常感谢您的 SQL。我现在正在分析它。我什至不知道从哪里开始,这已经教会了我很多!
猜你喜欢
  • 2020-04-21
  • 1970-01-01
  • 1970-01-01
  • 2019-02-12
  • 1970-01-01
  • 2016-04-25
  • 1970-01-01
  • 1970-01-01
  • 2012-03-09
相关资源
最近更新 更多