查询最大并发时间跨度数答案

【问题标题】：Query for maximum number of concurrent time spans查询最大并发时间跨度数
【发布时间】：2011-04-29 05:31:01
【问题描述】：

我有一个带有两个日期时间字段（CnxStartdatetime、CnxEnddatetime）的 SQL Server 表。每行代表一个信息的传输。 我正在尝试根据这两个时间戳找到最大并发传输数。我有一个有效的查询，但它既慢又非常麻烦。我知道必须有更好的方法来解决这个问题，但我想不出任何方法。

对于当前版本，如果我以 5 个“级别”运行它并获得结果，我必须返回并添加大量 SQL 来测试是否有 6 个并发传输的实例等。一旦查询得到 7- 8 个“级别”深它变得非常慢。

当前版本的片段：

select 
    t1.id, t2.id, t3.id, t4.id, t5.id, t6.id, t7.id, t8.id, t9.id, t10.id

FROM
dbo.MyTable t1, dbo.MyTable t2, dbo.MyTable t3, dbo.MyTable t4, dbo.MyTable t5,
dbo.MyTable t6, dbo.MyTable t7, dbo.MyTable t8, dbo.MyTable t9, dbo.MyTable t10
WHERE
(((t2.cnxstartdatetime >= t1.cnxstartdatetime) and (t2.cnxstartdatetime <= t1.cnxenddatetime))
or ((t2.cnxenddatetime >= t1.cnxstartdatetime) and (t2.cnxenddatetime <= t1.cnxenddatetime)))
AND
t2.id != t1.id
AND
(((t3.cnxstartdatetime >= t2.cnxstartdatetime) and (t3.cnxstartdatetime >= t1.cnxstartdatetime)and (t3.cnxstartdatetime <= t1.cnxenddatetime) and (t3.cnxstartdatetime <= t2.cnxenddatetime))
or ((t3.cnxenddatetime >= t2.cnxstartdatetime) and (t3.cnxenddatetime >= t1.cnxstartdatetime)and (t3.cnxenddatetime <= t1.cnxenddatetime) and (t3.cnxenddatetime <= t2.cnxenddatetime)))
AND
t3.id != t2.id AND t3.id != t1.id
AND
(((t4.cnxstartdatetime >= t3.cnxstartdatetime) and (t4.cnxstartdatetime >= t1.cnxstartdatetime)and (t4.cnxstartdatetime >= t2.cnxstartdatetime) and (t4.cnxstartdatetime <= t1.cnxenddatetime) and (t4.cnxstartdatetime <= t3.cnxenddatetime)and (t4.cnxstartdatetime <= t2.cnxenddatetime))
or ((t4.cnxenddatetime >= t3.cnxstartdatetime) and (t4.cnxenddatetime >= t1.cnxstartdatetime)and (t4.cnxenddatetime >= t2.cnxstartdatetime) and (t4.cnxenddatetime <= t1.cnxenddatetime)and (t4.cnxenddatetime <= t3.cnxenddatetime)and (t4.cnxenddatetime <= t2.cnxenddatetime)))
AND
t4.id != t3.id AND t4.id != t2.id AND t4.id != t1.id
... *snip*

编辑很多回复都建议我使用cross join。这并没有达到我正在寻找的结果。下面是一个记录“重叠”的cross join 结果示例。这是它给我的 ID 11787 的列表如您所见，11781 确实不重叠 11774 这只是时间跨度相交的任何记录的列表 11787

11774    2011-04-29 01:02:56.780    2011-04-29 01:02:58.793
11777    2011-04-29 01:02:56.780    2011-04-29 01:02:58.843
11778    2011-04-29 01:02:56.780    2011-04-29 01:02:58.950
11775    2011-04-29 01:02:56.793    2011-04-29 01:02:58.843
11776    2011-04-29 01:02:56.793    2011-04-29 01:02:58.890
11780    2011-04-29 01:02:58.310    2011-04-29 01:03:02.687
11779    2011-04-29 01:02:58.327    2011-04-29 01:03:02.543
11787    2011-04-29 01:02:58.530    2011-04-29 01:03:08.827 **
11781    2011-04-29 01:02:59.030    2011-04-29 01:03:05.187
11782    2011-04-29 01:02:59.247    2011-04-29 01:03:05.467
11784    2011-04-29 01:02:59.293    2011-04-29 01:03:05.810
11791    2011-04-29 01:03:00.107    2011-04-29 01:03:13.623
11786    2011-04-29 01:03:00.843    2011-04-29 01:03:08.983
11783    2011-04-29 01:03:02.560    2011-04-29 01:03:05.793
11785    2011-04-29 01:03:02.717    2011-04-29 01:03:07.357
11790    2011-04-29 01:03:05.200    2011-04-29 01:03:14.153
11804    2011-04-29 01:03:05.687    2011-04-29 01:03:25.577
11811    2011-04-29 01:03:07.093    2011-04-29 01:03:35.153
11799    2011-04-29 01:03:07.123    2011-04-29 01:03:24.437
11789    2011-04-29 01:03:08.793    2011-04-29 01:03:13.577

我也尝试过使用递归编写 CTE，但我不知道如何确保当前的 ID 与当前并发堆栈中的任何先前的 ID 不匹配。下面的代码只是在自身上递归，直到达到极限。

WITH TransmissionConcurrency (StartTime, EndTime, ConcurrencyLevel) AS
(
    SELECT
        CnxStartDatetime AS StartTime,
        CnxEndDatetime AS EndTime,
        1 AS ConcurrencyLevel
    FROM dbo.MyTable

    UNION ALL

    SELECT
        CASE WHEN d.CnxStartDatetime > tc.StartTime THEN d.CnxStartDatetime ELSE tc.StartTime END AS StartTime,
        CASE WHEN d.CnxEndDatetime < tc.EndTime THEN d.CnxEndDatetime ELSE tc.EndTime END AS EndDate,
        tc.ConcurrencyLevel + 1 as ConcurrencyLevel
    FROM dbo.MyTable d
        INNER JOIN TransmissionConcurrency tc ON
            ((d.CnxStartDatetime between tc.StartTime and tc.EndTime)
            or
            (d.CnxEndDatetime between tc.StartTime and tc.EndTime)
            or
            (d.CnxStartDatetime <= tc.StartTime and d.CnxEndDatetime >= tc.EndTime))
)

SELECT * 
FROM TransmissionConcurrency
ORDER BY ConcurrencyLevel, StartTime, EndTime

我想出了下图，试图更好地解释我在寻找什么。

A         [--------]
B    [-----]
C              [------]
D   [---]
E             [---]
F         [-]

在这种情况下，cross join 方法会告诉我，A 的最大并发数是 6（A 和 B, C, D, E and F）我正在寻找的最大并发数是 3（@987654340 @ 和 B,F 或 A 和 C,E)

【问题讨论】：

Finding simultaneous events in a database between times 的可能重复项

标签： sql sql-server datetime timespan

【解决方案1】：

杰夫。我曾经写过一个类似的查询——但在 Oracle 中——不确定这是否适用于 SQL-Server，但值得一试：也许它会给你一些想法：

select
  t.time as b,
  lead(t.time)  over (order by t.time, t.weight desc) as e,
  sum(t.weight) over (order by t.time, t.weight desc) as cnt
from
  ( select trunc(:aStartWith)   as time,  0 as weight from dual
    union all
    select req_recieved as time, +1 as weight
      from log_tbl
      where trunc(req_recieved, 'mi') between :aStartWith - interval '10' minute and :aEndWith + interval '10' minute
    union all
    select response_sent as time, -1 as weight
      from log_tbl
      where trunc(req_recieved, 'mi') between :aStartWith - interval '10' minute and :aEndWith + interval '10' minute
    union all
    select trunc(:aEndWith) as time,  0 as weight from dual
  ) t

一般的想法是，我处理:aStartWith 日期和:aEndWith 日期之间的所有请求，为给定期间开始的每个请求分配+1权重部分，为每个请求分配-1在同一时期结束。

这里我假设请求不再是 10 分钟 (where trunc(req_recieved, 'mi') between :aStartWith - interval '10' minute and :aEndWith + interval '10' minute)；和select ... from dual 是边界条件。

然后使用分析函数，我找到请求的结束时间 (lead(t.time) over (order by t.time, t.weight desc) as e) 并对当前请求的权重求和 - 这将给出从时间 b 开始到时间 e 结束的请求数量（ sum(t.weight) over (order by t.time, t.weight desc) as cnt)。

要查找最大请求数，您可以使用所需的评估包装此查询。

如果这种情况适合你，你能试试吗？希望是的:)

【讨论】：

所以这是基于传递代表时间跨度的时间戳对来检查时间跨度内发生的请求数？鉴于请求开始/结束的时间下降到毫秒，我应该如何想出所有合适的开始/结束时间戳对来传递？我可能对您的解决方案的工作原理感到困惑。

【解决方案2】：

declare @T table (ID int, Starts datetime, Ends datetime)
insert into @T (ID, Starts, Ends) values
(1, '2000-12-30', '2000-12-31'),
(2, '2001-01-01', '2001-01-10'),
(3, '2001-01-02', '2001-01-05'),
(4, '2001-01-03', '2001-01-04'),
(5, '2001-01-05', '2001-01-10')

select T1.ID, count(*) as Levels
from @T as T1
  cross join @T as T2
where
  T1.Starts < T2.Ends and
  T1.Starts > T2.Starts
group by T1.ID

select top 1 T1.ID, count(*) as Levels
from @T as T1
  cross join @T as T2
where
  T1.Starts < T2.Ends and
  T1.Starts > T2.Starts
group by T1.ID
order by count(*) desc

结果

ID          Levels
----------- -----------
3           1
4           2
5           1

(3 row(s) affected)

ID          Levels
----------- -----------
4           2

如果你想要所涉及的行，你可以使用这个：

select T2.*
from (select top 1 T1.ID
      from @T as T1
        cross join @T as T2
      where
        T1.Starts < T2.Ends and
        T1.Starts > T2.Starts
      group by T1.ID
      order by count(*) desc) as C
  inner join @T as T1
    on C.ID = T1.ID
  inner join @T as T2
    on T1.Starts < T2.Ends and
       T1.Starts > T2.Starts or
       T2.ID = C.ID

结果：

ID          Starts                  Ends
----------- ----------------------- -----------------------
2           2001-01-01 00:00:00.000 2001-01-10 00:00:00.000
3           2001-01-02 00:00:00.000 2001-01-05 00:00:00.000
4           2001-01-03 00:00:00.000 2001-01-04 00:00:00.000

【讨论】：

我在问题中添加了一些信息，以表明这不是我想要的。
@Jeff 添加了另一个版本。应该归咎于的不是交叉连接，而是 where 子句。我猜马丁建议的副本就是你想要的。这与我的更改之间的区别在于，当使用 between 时，between 是包容性的。此版本计算最大并发级别而不是总重叠跨度。

【解决方案3】：

与其说是“标准”数据库查询，不如说是报告解决方案。最好的选择是在每笔交易开始时在某处写入交易数量）。所有其他解决方案都会很慢。但如果你真的需要这个......

最简单的解决方案是将时间段划分为小部分（例如天）并分析每个时间段的计数。这是一个例子：

DECLARE @table TABLE
    (
      starts DATETIME ,
      ends DATETIME ,
      trn INT
    )

INSERT  INTO @table
        ( starts ,
          ends ,
          trn
        )
        SELECT  '2003-01-01' ,
                '2003-01-03' ,
                1
        UNION
        SELECT  '2003-01-02' ,
                '2003-01-04' ,
                2
        UNION
        SELECT  '2003-01-02' ,
                '2005-06-06' ,
                3 ;
WITH    numbers
          AS ( SELECT   Row_NUmber() OVER ( ORDER BY o.object_id, o2.object_id ) Number
               FROM     sys.objects o
                        CROSS JOIN sys.objects o2
             ),
        Maxx
          AS ( SELECT   MIN(starts) MaxStart ,
                        MAX(ends) MaxEnd
               FROM     @table
             ),
        DDays
          AS ( SELECT   MIN(starts) DDay
               FROM     @table
               UNION ALL
               SELECT   DDay + 1
               FROM     DDays
               WHERE    dday + 1 <= ( SELECT    MaxEnd
                                      FROM      Maxx
                                    )
             )
    SELECT  DDay ,
            COUNT(*) Transactions
    FROM    @Table T
            JOIN DDays D ON D.DDay >= T.starts
                            AND D.DDay <= T.ends
    GROUP BY DDay
    HAVING COUNT(*)>1
    ORDER BY COUNT(*) DESC
OPTION  ( MAXRECURSION 0 )

您可以修改最后一条语句以获取所需的信息（最大加载周期内的事务等）

【讨论】：

我觉得这种方法与 andr 有相同的问题，因为我所说的时间戳相差几毫秒，而您的方法需要我编造一些涵盖所有可能情况的时间段。还是我误解了它的工作原理？

【解决方案4】：

/* prepare sample data (if needed) */
CREATE TABLE MyTable (ID int, CnxStartdatetime datetime, CnxEnddatetime datetime);
INSERT INTO MyTable (ID, CnxStartdatetime, CnxEnddatetime)
SELECT 11774, '2011-04-29 01:02:56.780', '2011-04-29 01:02:58.793' UNION ALL
SELECT 11777, '2011-04-29 01:02:56.780', '2011-04-29 01:02:58.843' UNION ALL
SELECT 11778, '2011-04-29 01:02:56.780', '2011-04-29 01:02:58.950' UNION ALL
SELECT 11775, '2011-04-29 01:02:56.793', '2011-04-29 01:02:58.843' UNION ALL
SELECT 11776, '2011-04-29 01:02:56.793', '2011-04-29 01:02:58.890' UNION ALL
SELECT 11780, '2011-04-29 01:02:58.310', '2011-04-29 01:03:02.687' UNION ALL
SELECT 11779, '2011-04-29 01:02:58.327', '2011-04-29 01:03:02.543' UNION ALL
SELECT 11787, '2011-04-29 01:02:58.530', '2011-04-29 01:03:08.827' UNION ALL
SELECT 11781, '2011-04-29 01:02:59.030', '2011-04-29 01:03:05.187' UNION ALL
SELECT 11782, '2011-04-29 01:02:59.247', '2011-04-29 01:03:05.467' UNION ALL
SELECT 11784, '2011-04-29 01:02:59.293', '2011-04-29 01:03:05.810' UNION ALL
SELECT 11791, '2011-04-29 01:03:00.107', '2011-04-29 01:03:13.623' UNION ALL
SELECT 11786, '2011-04-29 01:03:00.843', '2011-04-29 01:03:08.983' UNION ALL
SELECT 11783, '2011-04-29 01:03:02.560', '2011-04-29 01:03:05.793' UNION ALL
SELECT 11785, '2011-04-29 01:03:02.717', '2011-04-29 01:03:07.357' UNION ALL
SELECT 11790, '2011-04-29 01:03:05.200', '2011-04-29 01:03:14.153' UNION ALL
SELECT 11804, '2011-04-29 01:03:05.687', '2011-04-29 01:03:25.577' UNION ALL
SELECT 11811, '2011-04-29 01:03:07.093', '2011-04-29 01:03:35.153' UNION ALL
SELECT 11799, '2011-04-29 01:03:07.123', '2011-04-29 01:03:24.437' UNION ALL
SELECT 11789, '2011-04-29 01:03:08.793', '2011-04-29 01:03:13.577';

/* start the job: */
WITH columnified AS (
  /* transform every row of (ID, CnxStartdatetime, CnxEnddatetime)
     into two rows as follows:
     (ID, CnxStartdatetime, CountChange = 1)
     (ID, CnxEnddatetime, CountChange = -1)
  */
  SELECT
    t.ID,
    Time = CASE x.CountChange WHEN 1 THEN CnxStartdatetime ELSE CnxEnddatetime END,
    x.CountChange
  FROM dbo.MyTable t
    CROSS JOIN (SELECT 1 AS CountChange UNION ALL SELECT -1) x
),
groupedandranked AS (
  /* group and rank the timestamps */
  SELECT
    Time,
    CountChange = SUM(CountChange),
    TimeRN = ROW_NUMBER() OVER (ORDER BY Time)
  FROM columnified
  GROUP BY time
),
counted AS (
  /* get the running counts by summing CountChange */
  SELECT
    Time,
    TimeRN,
    RunningCount = CountChange
  FROM groupedandranked
  WHERE TimeRN = 1
  UNION ALL
  SELECT
    t.Time,
    t.TimeRN,
    RunningCount = t.CountChange + c.RunningCount
  FROM groupedandranked t
    INNER JOIN counted c ON t.TimeRN = c.TimeRN + 1
),
countsranked AS (
  /* rank the running counts */
  SELECT
    *,
    CountRN = DENSE_RANK() OVER (ORDER BY RunningCount DESC)
  FROM counted
)
/* get the top ranked rows and their corresponding
   subsequent rows (for the ending timestamps) */
SELECT
  MaxCount = s.RunningCount,
  MaxCountStart = s.Time,
  MaxCountEnd = e.Time
FROM countsranked s
  LEFT JOIN countsranked e ON e.TimeRN = s.TimeRN + 1
WHERE s.CountRN = 1;

/* remove the sample data (unless it's your table) */
DROP TABLE MyTable

【讨论】：

【解决方案5】：

我知道游标不受欢迎，但交叉连接也是如此。这将为提供的示例数据返回 8。

-- assuming times table with columns s and e
declare @s datetime, @e datetime;
declare @t table(d datetime);
declare c cursor for select s,e from times order by s;
open c
while(1=1) begin
  fetch next from c into @s,@e
  if @@FETCH_STATUS<>0 break;
  update top(1) @t set d=@e where d<=@s;
  if @@ROWCOUNT=0 insert @t(d) values(@e);
end
close c
deallocate c

select COUNT(*) as MaxConcurrentTimeSpans from @t

【讨论】：