【问题标题】:SQL query with lots of JOIN conditions is very slow具有大量 JOIN 条件的 SQL 查询非常慢
【发布时间】:2013-05-30 20:12:42
【问题描述】:

我继承了一个 SQL Server 2008 R2 项目,其中包括从另一个表更新表:

  • Table1(大约 150,000 行)有 3 个电话号码字段(Tel1,Tel2,Tel3
  • Table2(大约有 20,000 行)有 3 个电话号码字段(Phone1,Phone2,Phone3

.. 当这些数字中的任何一个匹配时,Table1 应该被更新。

当前代码如下:

UPDATE t1
SET surname = t2.surname, Address1=t2.Address1, DOB=t2.DOB, Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3,
FROM Table1 t1 
inner join Table2 t2
on
(t1.Tel1 = t2.Phone1 and t1.Tel1 is not null) or
(t1.Tel1 = t2.Phone2 and t1.Tel1 is not null) or
(t1.Tel1 = t2.Phone3 and t1.Tel1 is not null) or
(t1.Tel2 = t2.Phone1 and t1.Tel2 is not null) or
(t1.Tel2 = t2.Phone2 and t1.Tel2 is not null) or
(t1.Tel2 = t2.Phone3 and t1.Tel2 is not null) or
(t1.Tel3 = t2.Phone1 and t1.Tel3 is not null) or
(t1.Tel3 = t2.Phone2 and t1.Tel3 is not null) or
(t1.Tel3 = t2.Phone3 and t1.Tel3 is not null);

但是,此查询需要 30 多分钟才能运行。

执行计划表明,主要瓶颈是Table1 上的聚集索引扫描周围的Nested Loop。两个表的ID 列都有聚集索引。

由于我的 DBA 技能非常有限,任何人都可以提出提高此查询性能的最佳方法吗?将Tel1Tel2Tel3 的索引添加到每一列是最好的做法,还是可以更改查询以提高性能?

【问题讨论】:

  • 对两个表的 Tel1,Tel2,Tel3 应用非聚集索引
  • 如果一个字段为空,那么= 将不会返回true - 你不需要所有这些and t1.Tel1 is not null。此外,您正在更新正在查询的字段,这可能会造成一些数据丢失(如果 Tel1 = Phone2Phone1 为空)。首先尝试规范化电话号码(即有一个链接表来保存电话号码)
  • 你能添加一些测试数据吗(比如在 SQLFiddle 中)
  • 全部排序; @Vishwajeet,根据以下答案拆分查询后,索引产生了影响。 @Keith,谢谢-当然,我知道NULLs 不会被评估,它们只是为了生效:P

标签: sql sql-server-2008


【解决方案1】:

乍一看,我建议从选择中删除所有 OR 条件。

看看这是否更快(它将您的更新转换为 3 个不同的更新):

UPDATE t1
SET surname = t2.surname, Address1=t2.Address1, DOB=t2.DOB, Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3,
FROM Table1 t1 
inner join Table2 t2
on
(t1.Tel1 is not null AND t1.Tel1 IN (t2.Phone1, t2.Phone2, t2.Phone3);

UPDATE t1
SET surname = t2.surname, Address1=t2.Address1, DOB=t2.DOB, Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3,
FROM Table1 t1 
inner join Table2 t2
on
(t1.Tel2 is not null AND t1.Tel2 IN (t2.Phone1, t2.Phone2, t2.Phone3);

UPDATE t1
SET surname = t2.surname, Address1=t2.Address1, DOB=t2.DOB, Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3,
FROM Table1 t1 
inner join Table2 t2
on
(t1.Tel3 is not null AND t1.Tel3 IN (t2.Phone1, t2.Phone2, t2.Phone3);

【讨论】:

  • 我也想过这种方法。它不会产生完全相同的结果,但可能足够接近。不同之处在于第一个查询中的任何匹配项都将在第二个和第三个查询中重新更新,但我说我不明白 OP 查询是如何工作的,因为每个 t1 行可能匹配多个 @987654323 @行
  • 我看不出结果会如何冲突。第一个查询仅查看 t1.Tel1 列,第二个查询仅查看 t1.Tel2 等...
  • t1.Tel1 在第一个查询中匹配时,t1.Tel2t1.Tel3 也会更新。 t1.Tel2t1.Tel3 将在查询 2 和 3 中明确匹配。
  • 我猜 t1.Tel1, t1.Tel2, t1.Tel3 可能有相同的结果,生成 3 个更新(而不是一个),但是 OP 查询的工作方式不是也差不多吗?
  • @Paul 好的,你是对的。我没有注意到 SET 子句中正在更新 t1.Tel2 和 t1.Tel3 ..
【解决方案2】:

首先规范化你的表数据:

insert into Table1Tel 
select primaryKey, Tel1 as 'tel' from Table1 where Tel1 is not null
union select primaryKey, Tel2 from Table1 where Tel2 is not null
union select primaryKey, Tel3 from Table1 where Tel3 is not null

insert into Table2Phone 
select primaryKey, Phone1 as 'phone' from Table2 where Phone1 is not null
union select primaryKey, Phone2 from Table2 where Phone2 is not null
union select primaryKey, Phone3 from Table2 where Phone3 is not null

与附加列相比,这些规范化表是存储电话号码的更好方法。

然后你可以在表格之间执行类似这样的连接:

update t1
set surname = t2.surname, 
    Address1 = t2.Address1, 
    DOB = t2.DOB
from Table1 t1 
     inner join Table1Tel tel
         on t1.primaryKey = tel.primaryKey
     inner join Table2Phone phone
         on tel.tel = phone.phone
     inner join Table2 t2
         on phone.primaryKey = t2.primaryKey

请注意,这并不能解决数据中出现欺骗的根本问题 - 例如,如果您的数据中的 Joe 和 Jane Bloggs 具有相同的电话号码(即使在不同的字段中),您会将两条记录更新为一样。

【讨论】:

    【解决方案3】:

    您也可以尝试以下方法,希望可以避免重复更新。

    UPDATE t1
    SET surname = t2.surname,
        Address1=t2.Address1, DOB=t2.DOB, 
        Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3
    FROM
        Table1 T1
    INNER JOIN
    (
    SELECT
        T1.ID AS T1_ID,
        T2.ID AS T2_ID
    FROM
        t1.Tel1 = t2.Phone1
    
    UNION
    
    SELECT
        T1.ID AS T1_ID,
        T2.ID AS T2_ID
    FROM
        t1.Tel1 = t2.Phone2
    
    UNION
    
    SELECT
        T1.ID AS T1_ID,
        T2.ID AS T2_ID
    FROM
        t1.Tel1 = t2.Phone3
    
    UNION
    
    SELECT
        T1.ID AS T1_ID,
        T2.ID AS T2_ID
    FROM
        t1.Tel2 = t2.Phone1
    
    UNION
    
    SELECT
        T1.ID AS T1_ID,
        T2.ID AS T2_ID
    FROM
        t1.Tel2 = t2.Phone2
    
    UNION
    
    SELECT
        T1.ID AS T1_ID,
        T2.ID AS T2_ID
    FROM
        t1.Tel2 = t2.Phone3
    
    SELECT
        T1.ID AS T1_ID,
        T2.ID AS T2_ID
    FROM
        t1.Tel3 = t2.Phone1
    
    UNION
    
    SELECT
        T1.ID AS T1_ID,
        T2.ID AS T2_ID
    FROM
        t1.Tel3 = t2.Phone2
    
    UNION
    
    SELECT
        T1.ID AS T1_ID,
        T2.ID AS T2_ID
    FROM
        t1.Tel3 = t2.Phone3
    
    ) X
    ON T1.ID = X.T1_ID
    INNER JOIN Table2 T2 ON X.T2_ID = T2.TD
    

    【讨论】:

      【解决方案4】:

      请尝试以下查询,让我知道完成执行需要多长时间。

      UPDATE t1
      SET surname = t2.surname, Address1=t2.Address1, DOB=t2.DOB, Tel1=t2.Phone1, Tel2=t2.Phone2, Tel3=t2.Phone3,
      FROM Table1 t1 
      inner join Table2 t2
      on (
          '|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|' LIKE '%|'+cast(t1.Tel1 as varchar(15)+'|%'
          or '|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|' LIKE '%|'+cast(t1.Tel2 as varchar(15)+'|%'
          or '|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|'+cast(t2.Phone1 as varchar(15)+'|' LIKE '%|'+cast(t1.Tel3 as varchar(15)+'|%'
          )
      

      将 3 OR 替换为 1 LIKE 应该更快。试试看吧。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2022-01-22
        • 2015-01-17
        • 2021-03-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多