【问题标题】:SQL query - Find daily MIN value from hourly sumsSQL 查询 - 从每小时总和中查找每日 MIN 值
【发布时间】:2015-05-30 19:30:00
【问题描述】:

让我们切入正题。我有一张看起来像这样的表(使用 SQL Server 2014):

演示: http://sqlfiddle.com/#!6/75f4a/1/0

CREATE TABLE TAB (
    DT datetime,
    VALUE float
);

INSERT INTO TAB VALUES
('2015-05-01 06:00:00', 12),
('2015-05-01 06:20:00', 10),
('2015-05-01 06:40:00', 11),
('2015-05-01 07:00:00', 14),
('2015-05-01 07:20:00', 15),
('2015-05-01 07:40:00', 13),
('2015-05-01 08:00:00', 10),
('2015-05-01 08:20:00', 9),
('2015-05-01 08:40:00', 5),

('2015-05-02 06:00:00', 19),
('2015-05-02 06:20:00', 7),
('2015-05-02 06:40:00', 11),
('2015-05-02 07:00:00', 9),
('2015-05-02 07:20:00', 7),
('2015-05-02 07:40:00', 6),
('2015-05-02 08:00:00', 10),
('2015-05-02 08:20:00', 19),
('2015-05-02 08:40:00', 15),

('2015-05-03 06:00:00', 8),
('2015-05-03 06:20:00', 8),
('2015-05-03 06:40:00', 8),
('2015-05-03 07:00:00', 21),
('2015-05-03 07:20:00', 12),
('2015-05-03 07:40:00', 7),
('2015-05-03 08:00:00', 10),
('2015-05-03 08:20:00', 4),
('2015-05-03 08:40:00', 10)

我需要:

  • 每小时总和
  • 为每一天选择最小的“小时总和”
  • 选择该总和发生的时间

换句话说,我想要一个如下所示的表格:

DATE |  SUM VAL | ON HOUR
--------------------------
2015-03-01 | 24 | 8:00 
2015-03-02 | 22 | 7:00 
2015-03-03 | 24 | 6:00 

前两点非常简单(查看 sqlfiddle)。我对第三个有问题。我不能只是喜欢选择 Datepart(HOUR, DT) 因为它必须被聚合。我试图使用 JOINS 和 WHERE 子句,但没有成功(某些值可能在表中出现多次,这会引发错误)。

我对 SQL 有点陌生,但我被卡住了。需要你的帮助! :)

【问题讨论】:

  • 在您的示例数据中,两个不同的小时对于第三个具有相同的值总和 (24)。您希望返回两条记录还是只返回一条记录,如果需要,返回哪一条?
  • @jpw 我是故意这样做的。我只想选择一行 - “较小”的小时。我看到了很多不同的解决方案,明天将在工作台上使用真实数据进行尝试。
  • @rafakob 好吧,那是我的猜测。

标签: sql sql-server tsql


【解决方案1】:
DECLARE @TAB TABLE
    (
      DT DATETIME ,
      VALUE FLOAT
    );

INSERT  INTO @TAB
VALUES  ( '2015-05-01 06:00:00', 12 ),
        ( '2015-05-01 06:20:00', 10 ),
        ( '2015-05-01 06:40:00', 11 ),
        ( '2015-05-01 07:00:00', 14 ),
        ( '2015-05-01 07:20:00', 15 ),
        ( '2015-05-01 07:40:00', 13 ),
        ( '2015-05-01 08:00:00', 10 ),
        ( '2015-05-01 08:20:00', 9 ),
        ( '2015-05-01 08:40:00', 5 ),
        ( '2015-05-02 06:00:00', 19 ),
        ( '2015-05-02 06:20:00', 7 ),
        ( '2015-05-02 06:40:00', 11 ),
        ( '2015-05-02 07:00:00', 9 ),
        ( '2015-05-02 07:20:00', 7 ),
        ( '2015-05-02 07:40:00', 6 ),
        ( '2015-05-02 08:00:00', 10 ),
        ( '2015-05-02 08:20:00', 19 ),
        ( '2015-05-02 08:40:00', 15 ),
        ( '2015-05-03 06:00:00', 8 ),
        ( '2015-05-03 06:20:00', 8 ),
        ( '2015-05-03 06:40:00', 8 ),
        ( '2015-05-03 07:00:00', 21 ),
        ( '2015-05-03 07:20:00', 12 ),
        ( '2015-05-03 07:40:00', 7 ),
        ( '2015-05-03 08:00:00', 10 ),
        ( '2015-05-03 08:20:00', 4 ),
        ( '2015-05-03 08:40:00', 10 );
WITH    cteh
          AS ( SELECT   DT ,
                        CAST(dt AS DATE) AS D ,
                        SUM(VALUE) OVER ( PARTITION BY CAST(dt AS DATE),
                                          DATEPART(hh, DT) ) AS S
               FROM     @TAB
             ),
        ctef
          AS ( SELECT   * ,
                        ROW_NUMBER() OVER ( PARTITION BY D ORDER BY S ) AS rn
               FROM     cteh
             )
    SELECT  D ,
            S ,
            CAST(DT AS TIME) AS H
    FROM    ctef
    WHERE   rn = 1

输出:

D           S   H
2015-05-01  24  08:00:00.0000000
2015-05-02  22  07:00:00.0000000
2015-05-03  24  06:00:00.0000000

【讨论】:

    【解决方案2】:

    一种方法是使用具有最小小时值的集合作为派生表并与之连接。我会这样做:

    ;WITH CTE AS (
        SELECT Cast(Format(DT, 'yyyy-MM-dd HH:00') AS datetime) AS DT, SUM(VALUE) AS VAL
        FROM TAB
        GROUP BY Format(DT, 'yyyy-MM-dd HH:00')
    ) 
    
    SELECT b.dt "Date", val "sum val", cast(min(a.dt) as time) "on hour"
    FROM cte a JOIN (
        SELECT Format(DT,'yyyy-MM-dd') AS DT, MIN(VAL) AS DAILY_MIN 
        FROM cte HOURLY
        GROUP BY Format(DT,'yyyy-MM-dd')
    ) b ON CAST(a.DT AS DATE) = b.DT and a.VAL = b.DAILY_MIN
    GROUP BY b.DT, a.VAL
    

    这将得到:

    Date        sum val on hour
    2015-05-01  24      08:00:00.0000000
    2015-05-02  22      07:00:00.0000000
    2015-05-03  24      06:00:00.0000000
    

    我使用 min() 作为时间部分,因为您的样本数据在第三个单独的两个小时内具有相同的低值。如果你想要两者,那么从外部选择和分组中删除 min 函数。然后你会得到:

    Date        sum val on hour
    2015-05-01  24      08:00:00.0000000
    2015-05-02  22      07:00:00.0000000
    2015-05-03  24      06:00:00.0000000
    2015-05-03  24      08:00:00.0000000
    

    我确信它可以改进,但你应该明白这一点。

    【讨论】:

    • 谢谢,我稍微调整了代码,经过测试,效果很好! :) 我必须说我喜欢 CTE 的那些解决方案,不知道你可以在 SQL 中做这样的事情。
    【解决方案3】:

    这是一种使用临时表(与其他解决方案中的 CTE 相对)存储计算值然后过滤结果以提供所需输出的方法:

    -- INSERT CALCULATED GROUPED VALUES INTO TEMP TABLE
    SELECT  CONVERT(DATE, DT) AS DateVal ,
            SUM(VALUE) AS SumVal ,
            DATEPART(HOUR, CONVERT(TIME, DT)) AS HourVal
    INTO    #TEMP_CALC
    FROM    TAB
    GROUP BY CONVERT(DATE, DT) , DATEPART(HOUR, CONVERT(TIME, DT))
    
    -- TAKE THE RELEVANT ROWS
    SELECT  t.DateVal ,
            MIN(t.SumVal) AS SumVal ,
            ( SELECT TOP 1
                        HourVal
              FROM      #TEMP_CALC t2
              WHERE     t2.DateVal = t.DateVal
                        AND t2.SumVal = MIN(t.SumVal)
            ) AS MinHour
    FROM    #TEMP_CALC t
    GROUP BY t.DateVal
    ORDER BY DateVal
    

    【讨论】:

    • 如果使用公用表表达式,可以放弃创建临时表的需要,一键完成调用
    【解决方案4】:

    您可以使用DATEDIFF 来获取从任何时间点开始的时间跨度(在此示例中为1990-1-1),以小时和天为单位。跨越到group和order的使用,最后使用DATEADD同起点重构:

    WITH dates AS (
      SELECT CAST(DT AS DATETIME) AS Date, -- cast the value to date
      value FROM dbo.TAB AS T
    ),
    ddh AS (SELECT 
        date,
        DATEDIFF(DAY, '1990-1-1', date) AS daySpan,    -- days span
        DATEDIFF(HOUR, '1990-1-1', date) AS hourSpan,  -- hours span
        value
        FROM dates
    ),
    ddhv AS ( SELECT
        daySpan,
        hourSpan,
        SUM(value) AS sumValues    -- sum...
        FROM ddh
        group BY daySpan, hourSpan -- ...grouped by day & hour
    ),
    ddhvr AS ( SELECT
        daySpan,
        hourSpan,
        sumValues,
        -- number rows by hourly sum of the value
        ROW_NUMBER() OVER (PARTITION BY daySpan ORDER BY sumValues) AS row
    FROM ddhv
    )
    SELECT
        DATEADD(HOUR, hourSpan, '1990-1-1') AS DayHour, -- rebuild the date/hour
        sumValues
    FROM ddhvr
    WHERE row = 1 -- take only the first occurrence for each day
    

    此查询的优点是您可以轻松更改句点和起点。例如,您可以让您的日子从早上 6:30 而不是 00:00 开始,这样比较的时段是 6:30 到 7:30、7:30 到 8:30 等等。您还可以更改分组单位,例如,不是 1 小时,而是半小时、5 分钟或 2 小时。如果您需要这样做,请see this SO answer。在那里,您将看到如何按不同时期进行分组,并回到时期起点。这只是一些简单的数学运算。

    【讨论】:

      【解决方案5】:

      我用你的小提琴测试了我的:

      with agg as (
          select cast(dt as date) as dt, datepart(hh, dt) as hr, sum(VALUE) as sum_val
          from TAB
          group by cast(dt as date), datepart(hh, dt)
      )
      select
          dt, min(sum_val) as "SUM VAL",
          (
              select cast(hr as varchar(2)) + ':00' from agg as agg2
              where agg2.dt = agg.dt and not exists (
                  /* select earliest in case of ties */
                  select 1 from agg as agg3
                  where agg3.dt = agg2.dt and agg3.sum_val >= agg3.sum_val and agg3.hr > agg2.hr
              )
          ) as "ON HOUR"
      from agg
      group by dt;
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2021-08-05
        • 1970-01-01
        • 2021-08-07
        • 1970-01-01
        • 2014-08-14
        • 2016-03-01
        • 1970-01-01
        相关资源
        最近更新 更多