【问题标题】:How to improve perfomance on query NOT IN如何提高查询 NOT IN 的性能
【发布时间】:2015-05-28 15:45:44
【问题描述】:

我有以下 SQL 查询。

SELECT em.employeeid, tsi.timestamp
FROM timesheet_temp_import tsi
JOIN employee emp ON emp.employeeid = tsi.credentialnumber
WHERE
tsi.masterentity = 'MASTER' AND
tsi.timestamp NOT IN
(
    SELECT ea.timestamp 
    FROM employee_attendance ea 
    WHERE 
    ea.employeeid = em.employeeid
    AND ea.timestamp =  tsi.timestamp
    AND ea.ismanual = 0
)
GROUP BY em.employeeid, tsi.timestamp

此查询比较导入表(与员工时间和出勤时间戳)。

有时timesheet_temp_import 有超过 95,000 行,我的查询必须显示该员工的时间戳。如果员工的时间戳已经存在,那么我必须排除它。

查询正在运行,但耗时超过 4 分钟,所以我想知道是否可以使用其他可以帮助我减少时间的改进 NOT IN 语句。

【问题讨论】:

  • 关于查询性能的问题,请同时指定表结构和索引,以及这些表中数据量的指示。
  • 您能否使用WHERE 子句限制主要的SELECT 以仅获取timesheet_temp_import 中的最新条目?您可以使用某种书签...

标签: sql sql-server performance tsql notin


【解决方案1】:

使用NOT EXISTS 可能会对您有所帮助。

SELECT 
    em.employeeid,
    tsi.timestamp
    FROM timesheet_temp_import tsi
    join employee emp ON emp.employeeid = tsi.credentialnumber
    WHERE
    tsi.masterentity = 'MASTER' AND

    NOT EXISTS 
    (
        SELECT NULL  
        FROM employee_attendance ea 
        WHERE 
        ea.employeeid = em.employeeid
        AND ea.timestamp =  tsi.timestamp
        AND ea.ismanual = 0
    )
    GROUP BY 
    em.employeeid,
    tsi.timestamp

【讨论】:

【解决方案2】:

你有这个查询:

SELECT em.employeeid, tsi.timestamp
FROM timesheet_temp_import tsi JOIN
     employee emp
     ON emp.employeeid = tsi.credentialnumber
WHERE tsi.masterentity = 'MASTER' AND
      tsi.timestamp NOT IN (SELECT ea.timestamp 
                            FROM employee_attendance ea 
                            WHERE ea.employeeid = em.employeeid AND
                                  ea.timestamp =  tsi.timestamp AND
                                  ea.ismanual = 0
                           )
GROUP BY em.employeeid, tsi.timestamp;

在重写查询之前(而不是重新格式化它;),我会检查索引和逻辑。 GROUP BY 有必要吗?也就是说,外部查询是否产生了重复项?我的猜测是否定的,但我不知道你的数据。

其次,您需要索引。我认为以下索引:timesheet_temp_import(masterentity, credentialnumber, timestamp)employee(employeeid)employee_attendance(employeeid, timestamp, ismanual)

第三,我想问你是否有非员工的时间表。我认为您可以摆脱外部join。因此,这可能是您想要的查询:

SELECT tsi.credentialnumber as employeeid, tsi.timestamp
FROM timesheet_temp_import tsi
WHERE tsi.masterentity = 'MASTER' AND
      tsi.timestamp NOT IN (SELECT ea.timestamp 
                            FROM employee_attendance ea 
                            WHERE ea.employeeid = tsi.credentialnumber AND
                                  ea.timestamp =  tsi.timestamp AND
                                  ea.ismanual = 0
                           );

您还可以通过将NOT IN 替换为NOT EXISTS 来获得微小的改进。

【讨论】:

    【解决方案3】:

    另一种方法是使用except

    select whatever
    from wherever
    where somefield in 
    (select all potential values of that field
    except
    select the values you want to exlude)
    

    这在逻辑上等同于not in,但速度更快。

    【讨论】:

      【解决方案4】:

      试试这个,我觉得你的意思是 emp

      SELECT distinct tsi.credentialnumber, tsi.timestamp
        FROM timesheet_temp_import tsi
        JOIN employee emp 
          ON emp.employeeid = tsi.credentialnumber
         and tsi.masterentity = 'MASTER' 
        left join employee_attendance ea 
          on ea.employeeid = emp.employeeid
         AND ea.timestamp = tsi.timestamp
         AND ea.ismanual = 0
       where ea.employeeid is null
      

      根据索引,这可能会更快

      SELECT distinct tsi.credentialnumber, tsi.timestamp
        FROM timesheet_temp_import tsi
        JOIN employee emp 
          ON emp.employeeid = tsi.credentialnumber
         and tsi.masterentity = 'MASTER' 
        left join employee_attendance ea 
          on ea.employeeid = tsi.credentialnumber
         AND ea.timestamp = tsi.timestamp
         AND ea.ismanual = 0
       where ea.employeeid is null
      

      【讨论】:

        【解决方案5】:

        使用LEFT JOINWHERE 子句代替NOT IN 进行过滤:

        SELECT 
            em.employeeid,
            tsi.timestamp
            FROM timesheet_temp_import tsi
            join employee emp ON emp.employeeid = tsi.credentialnumber
            left join 
            (
                SELECT ea.timestamp 
                FROM employee_attendance ea 
                WHERE 
                ea.employeeid = em.employeeid
                AND ea.timestamp =  tsi.timestamp
                AND ea.ismanual = 0
            ) t on t.timestamp = tsi.timestamp
            WHERE
            tsi.masterentity = 'MASTER' AND
            t.timestamp is null
            GROUP BY 
            em.employeeid,
            tsi.timestamp
        

        【讨论】:

        • @Blam 你不能用左连接将条件与外部表关联起来
        猜你喜欢
        • 2017-03-31
        • 2014-05-19
        • 2010-10-07
        • 1970-01-01
        • 2018-10-18
        • 1970-01-01
        • 2020-11-05
        • 2018-06-21
        • 2016-01-17
        相关资源
        最近更新 更多