【发布时间】:2019-09-03 08:48:27
【问题描述】:
情况:
我们有一个数据库“base1”~600万行数据,显示了实际客户购买和购买日期+本次购买的参数。
CREATE TABLE base1 (
User_id NOT NULL PRIMARY KEY ,
PurchaseDate date,
Parameter1 int,
Parameter2 int,
...
ParameterK int );
还有另一个数据库“base2”~ 9000 万行数据,它实际上显示了同样的事情,但不是购买日期,而是使用每周部分(例如:每个客户 4 年的所有周 -如果 N 周没有购买,仍然显示客户)。
CREATE TABLE base2 (
Users_id NOT NULL PRIMARY KEY ,
Week_start date ,
Week_end date,
Parameter1 int,
Parameter2 int,
...
ParameterN int );
执行以下查询的任务:
-- a = base1 , b , wb%% = base2
--create index idx_uid_purch_date on base1(Users_ID,Purchasedate);
SELECT a.Users_id
-- Checking whether the client will make a purchase in next week and the purchase will be bought on condition
,iif(b.Users_id is not null,1,0) as User_will_buy_next_week
,iif(b.Users_id is not null and b.Parameter1 = 1,1,0) as User_will_buy_on_Condition1
-- about 12 similar iif-conditions
,iif(b.Users_id is not null and (b.Parameter1 = 1 and b.Parameter12 = 1),1,0)
as User_will_buy_on_Condition13
-- checking on the fact of purchase in the past month, 2 months ago, 2.5 months, etc.
,iif(wb1m.Users_id is null,0,1) as was_buy_1_month_ago
,iif(wb2m.Users_id is null,0,1) as was_buy_2_month_ago
,iif(wb25m.Users_id is null,0,1) as was_buy_25_month_ago
,iif(wb3m.Users_id is null,0,1) as was_buy_3_month_ago
,iif(wb6m.Users_id is null,0,1) as was_buy_6_month_ago
,iif(wb1y.Users_id is null,0,1) as was_buy_1_year_ago
,a.[Week_start]
,a.[Week_end]
into base3
FROM base2 a
-- Join for User_will_buy
left join base1 b
on a.Users_id =b.Users_id and
cast(b.[PurchaseDate] as date)>=DATEADD(dd,7,cast(a.[Week_end] as date))
and cast(b.[PurchaseDate] as date)<=DATEADD(dd,14,cast(a.[Week_end] as date))
-- Joins for was_buy
left join base1 wb1m
on a.Users_id =wb1m.Users_id
and cast(wb1m.[PurchaseDate] as date)>=DATEADD(dd,-30-4,cast(a.[Week_end] as date))
and cast(wb1m.[PurchaseDate] as date)<=DATEADD(dd,-30+4,cast(a.[Week_end] as date))
/* 4 more similar joins where different values are added in
DATEADD (dd, %%, cast (a. [Week_end] as date))
to check on the fact of purchase for a certain period */
left outer join base1 wb1y
on a.Users_id =wb1y.Users_id and
cast(wb1y.[PurchaseDate] as date)>=DATEADD(dd,-365-4,cast(a.[Week_end] as date))
and cast(wb1y.[PurchaseDate] as date)<=DATEADD(dd,-365+5,cast(a.[Week_end] as date))
由于有大量的连接和相当大的数据库 - 这个脚本运行了大约 24 小时,这非常长。
正如执行计划所示,主要时间花在“Merge Join”上,从base1和base2查看表的行,并将数据插入到另一个base3表中。
问题:是否可以优化此查询使其运行得更快?
也许使用一个 Join 代替什么的。
请帮忙,我不够聪明:(
感谢大家的回答!
UPD:也许使用不同类型的连接(合并、循环或散列)可能对我有帮助,但无法真正检查这个理论。也许有人可以告诉我这是对还是错;)
【问题讨论】:
-
“问题”是
ON中的DATEADD(dd, 7, CAST(a.[Week_end] AS date))之类的语法不是 SARGable,这意味着不能使用索引来帮助数据引擎必须对表执行全面扫描. -
要扩展@Larnu 所说的内容,您的第一步是重写连接,以便它们不使用函数。原因是需要对表中的每一行运行该函数之前 SQL 可以比较和过滤。而不是只比较符合 JOIN 标准的行——它会每次都做。而且它不能使用索引,这会加快这个过程。这样想,你有一本包含数百万行日期的书。您是 a) 翻译日期然后与书比较还是 b) 先翻译整本书?
-
非常感谢您的回答,现在我什至知道 SARGable 是什么意思了????但是仍然不明白如何完全不使用任何函数来重写它们(我已经删除了所有“强制转换”,但仍然存在 DATEADD)
-
您确定查询执行您希望它执行的操作吗?你到底想达到什么目的?您从 base2 中选择 9000 万行而不使用任何过滤器,然后外部连接 base1 的日期范围,因此您最终会得到 9000 万到 44 亿个结果行,或者我已经计算过了。
-
@ThorstenKettner 此查询的输出是每周的每个客户数据,以及有关客户是上个月还是去年购买的附加信息 + 客户下周是否会根据购买历史(base1)购买东西跨度>
标签: sql sql-server join select query-optimization