【问题标题】:SQL join condition either A or B but not both A and BSQL 连接条件 A 或 B 但不是 A 和 B
【发布时间】:2019-02-08 12:34:08
【问题描述】:

我有按年和季度划分的销售数据,我想用最后可用值来填补缺失的季度。

假设我们有源表:

+------+---------+-------+--------+
| year | quarter | sales | row_no |
+------+---------+-------+--------+
| 2018 |       1 |  4000 |      5 |
| 2018 |       2 |  6000 |      4 |
| 2018 |       3 |  5000 |      3 |
| 2018 |       4 |  3000 |      2 |
| 2019 |       1 |  8000 |      1 |
+------+---------+-------+--------+

期望的结果:

+------+---------+-------+------------------------+
| year | quarter | sales |                        |
+------+---------+-------+------------------------+
| 2018 |       1 |  4000 |                        |
| 2018 |       2 |  6000 |                        |
| 2018 |       3 |  5000 |                        |
| 2018 |       4 |  3000 |                        |
| 2019 |       1 |  8000 |                        |
| 2019 |       2 |  8000 | <repeat the last value |
| 2019 |       3 |  8000 | <repeat the last value |
| 2019 |       4 |  8000 | <repeat the last value |
+------+---------+-------+------------------------+

因此,任务是使年份和季度的笛卡尔坐标系,然后将相应的或最后的销售额加入其中。

这段代码让我快到了:

select r.year, k.quarter, t.sales
from (select distinct year        from [MyTable]) r cross join
     (select distinct quarter     from [MyTable]) k left join
     [MyTable] t
     on (r.year = t.year and k.quarter=t.quarter) or row_no=1

如何更正最后一行(加入条件)使2018不翻倍?

【问题讨论】:

    标签: sql sql-server join


    【解决方案1】:

    一种方法使用外部应用:

    select y.year, q.quarter, t.sales
    from (select distinct year from [MyTable]) y cross join
         (select distinct quarter from [MyTable]) q outer apply
         (select top (1) t.*
          from [MyTable] t
          where t.year < y.year or
                (t.year = y.year and t.quarter <= q.quarter)
          order by t.year desc, t.quarter desc
         ) t;
    

    对于您的数据量,这应该没问题。

    一种更有效的方法 - 假设您只是在末尾分配值 - 是:

    select y.year, q.quarter,
           coalesce(t.sales, tdefault.sales)
    from (select distinct year from [MyTable]) y cross join
         (select distinct quarter from [MyTable]) q left join
         [MyTable] t
         on t.year = y.year and
            t.quarter = q.quarter cross join
         (select top (1) t.*
          from [MyTable] t
          order by t.year desc, t.quarter desc
         ) tdefault
    

    【讨论】:

      【解决方案2】:

      使用 CTE 和一些窗口函数的非常不同的方法。这不需要对表进行 2 次扫描,也不需要三角连接。

      WITH VTE AS(
          SELECT *
          FROM (VALUES (2018,1,4000,5),
                       (2018,2,6000,4),
                       (2018,3,5000,3),
                       (2018,4,3000,2),
                       (2019,1,8000,1)) V([Year],[Quarter],sales, row_no)),
      CTE AS(
          SELECT Y.Year,
                 Q.Quarter,
                 V.sales,
                 V.row_no,
                 COUNT(CASE WHEN V.sales IS NOT NULL THEN 1 END) OVER (ORDER BY Y.[Year], Q.[Quarter]
                                                                       ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
          FROM (VALUES(2018),(2019)) Y([Year])
               CROSS JOIN (VALUES(1),(2),(3),(4)) Q([Quarter])
               LEFT JOIN VTE V ON Y.[Year] = V.[Year] AND Q.[Quarter] = V.[Quarter])
      SELECT C.[Year],
             C.[Quarter],
             MAX(C.sales) OVER (PARTITION BY C.Grp) AS Sales
      FROM CTE C;
      

      这仅适用于 SQL Server 2012+(因为ROWS BETWEEN 是在 SQL Server 2012 中引入的),但是,希望您没有使用 2008- (几乎)完全不受支持。

      【讨论】:

        【解决方案3】:

        我会简单地做JOIN

        SELECT TT.YEAR, TT.Quarter, COALESCE(T.SALES, MAX(T.SALES) OVER (PARTITION BY TT.YEAR)) AS sales 
        FROM (SELECT DISTINCT T.YEAR, TT.Quarter
              FROM [MyTable] T CROSS JOIN
                   ( SELECT DISTINCT TT.Quarter FROM [MyTable] TT ) TT
             ) TT LEFT JOIN 
             [MyTable] T 
             ON TT.YEAR = T.YEAR AND TT.Quarter = T.Quarter;
        

        编辑:我只是误读了其他quarters 的问题,所以你需要OUTER JOIN 中的APPLY

        SELECT TT.YEAR, TT.Quarter, COALESCE(T.SALES, T1.SALES) AS Sales 
        FROM (SELECT DISTINCT T.YEAR, TT.Quarter
              FROM [MyTable] T CROSS JOIN
                   ( SELECT DISTINCT TT.Quarter FROM [MyTable] TT ) TT
             ) TT LEFT JOIN 
             [MyTable] T 
             ON TT.YEAR = T.YEAR AND TT.Quarter = T.Quarter OUTER APPLY 
             ( SELECT TOP (1) T.*
               FROM [MyTable] T
               WHERE T.YEAR = TT.YEAR
               ORDER BY T.Quarter DESC
             ) T1;
        

        【讨论】:

        • 您确定上一期销售额意外最高这一事实不会影响您的查询吗?
        • 说时间过去了,我们在源表中有额外的季度。如果 2019 年第二季度销售额=7000(少于 2019 年第一季度)怎么办。然后按年进行 MAX 分区将产生 8000(2019 年第一季度的值),这不是我想要的。我想要最后一个可用值,在这种情况下是 2019 Q2。
        • @PrzemyslawRemin。 . .正确我误读了这个问题,忘记了你所拥有的场景。谢谢。
        猜你喜欢
        • 1970-01-01
        • 2015-06-10
        • 2014-03-29
        • 2022-11-16
        • 1970-01-01
        • 1970-01-01
        • 2021-10-06
        • 2019-03-22
        • 2019-08-07
        相关资源
        最近更新 更多