SQL 速度：返回日期范围内的每个日期，并为每个日期返回 count()答案

【问题标题】：SQL speed: Return each date in date range and count() for eachSQL 速度：返回日期范围内的每个日期，并为每个日期返回 count()
【发布时间】：2017-04-03 23:08:51
【问题描述】：

我的目标是返回一个日期范围内的每个日期，并计算每个日期的所有记录。

MyTable
-------------------------------
| OrderId |   DateFinalized   |
-------------------------------
|   51    | 2016-1-3 12:50:34 |
|   55    | 2016-1-4 10:01:56 |
|   73    | 2016-1-4 11:52:02 |
|   93    | 2016-1-6 01:35:16 |
|   104   | 2016-1-6 02:40:47 |
-------------------------------

挑战是也包括没有订单的日期。使用上面的MyTable，如果日期范围在2016-1-1 和2016-1-6 之间，则所需的输出为：

---------------------
|  MyDate  | Orders |
---------------------
| 2016-1-1 |   0    |
| 2016-1-2 |   0    |
| 2016-1-3 |   1    |
| 2016-1-4 |   2    |
| 2016-1-5 |   0    |
| 2016-1-6 |   2    |
---------------------

为此，我使用此查询来选择 dates only，并在 1 秒内执行：

declare @startdate datetime = '1/1/2016';
declare @enddate datetime = '1/1/2017';

with [dates] as (
    select convert(date, @startdate) as [date] 
    union all
    select dateadd(day, 1, [date])
    from [dates]
    where [date] < @enddate 
)
select 
[date]
from [dates] 
where [date] between @startdate and @enddate
order by [date] desc
option (maxrecursion 0)

当我选择按日期分组的订单计数时，如下所示，它也只需要 1 秒：

declare @startdate datetime = '2/1/2016';
declare @enddate datetime = '1/1/2017';
select 
convert(date,DATEADD(dd, DATEDIFF(dd, 0, datefinalized), 0))  as Dates,
count(OrderID) as OrderCount
from orders 
where datefinalized between @startdate and @enddate
GROUP BY DATEADD(dd, DATEDIFF(dd, 0, datefinalized), 0)
order by DATEADD(dd, DATEDIFF(dd, 0, datefinalized), 0) desc

问题是当我将这两个查询组合在一个 SQL 语句中时。 LEFT JOIN 需要 20 秒(!!!) 来执行。我尝试了一个用于咯咯笑的子查询，但在 13 秒时并没有好多少：

如何有效地加入生成的数据集？

提前感谢您的宝贵时间。

【问题讨论】：

使用索引：dev.mysql.com/doc/refman/5.5/en/optimization-indexes.html
尝试使用数字表而不是递归 cte。我用过rextester to check it but couldn't reproduce the problem.

标签： sql sql-server

【解决方案1】：

使用递归 cte 是生成日期范围的最糟糕的方法之一。与使用递归 cte 相比，使用堆叠 cte 是 much faster 来按需生成日期范围。

如果您要在多行或长时间内使用它，或者您将多次运行此类操作，您最好只创建一个Dates 或Calendar 表。

只需要 152kb 的内存，你可以在一个表中保存 30 年的日期，你可以像这样使用它：

/* dates table */ 
declare @fromdate date = '20000101';
declare @years    int  = 30;
/* 30 years, 19 used data pages ~152kb in memory, ~264kb on disk */
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
select top (datediff(day, @fromdate,dateadd(year,@years,@fromdate)))
    [Date]=convert(date,dateadd(day,row_number() over(order by (select 1))-1,@fromdate))
into dbo.Dates
from n as deka cross join n as hecto cross join n as kilo 
               cross join n as tenK cross join n as hundredK
order by [Date];

create unique clustered index ix_dbo_Dates_date 
  on dbo.Dates([Date]);

并像这样查询它：

select
    d.[Date]
  , OrderCount = count(o.OrderID)
from dates d
  left join orders o
    on convert(date,o.OrderDate) = d.[Date]
group by d.[Date]
order by d.[Date] desc

数字和日历表参考：

如果你真的不想要日历表，你可以只使用堆叠的 cte 部分：

declare @fromdate date = '20160101';
declare @years    int  = 1;
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
, dates as (
  select top (datediff(day, @fromdate,dateadd(year,@years,@fromdate)))
      [Date]=convert(date,dateadd(day,row_number() over(order by (select 1))-1,@fromdate))
  from n as deka cross join n as hecto cross join n as kilo 
                /* cross join n as tenK cross join n as hundredK */
   order by [Date]
)
select
    d.[Date]
  , OrderCount = count(o.OrderID)
from dates d
  left join orders o
    on convert(date,o.OrderDate) = d.[Date]
group by d.[Date]
order by d.[Date] desc

【讨论】：