SQL 查询 - 长时间运行/占用 CPU 资源答案

【问题标题】：SQL Query - long running / taking up CPU resourceSQL 查询 - 长时间运行/占用 CPU 资源
【发布时间】：2018-10-17 08:57:24
【问题描述】：

您好，我有以下 SQL 查询，平均需要 40 分钟才能运行，它引用的其中一个表中有超过 700 万条记录。

我已经通过数据库调优顾问运行了这个并应用了所有建议，我还在 sql 的活动监视器中对其进行了评估，并且没有推荐进一步的索引等。

任何建议都会很棒，在此先感谢

WITH CTE AS 
(
    SELECT r.Id AS ResultId,
    r.JobId,
    r.CandidateId,
    r.Email,
    CAST(0 AS BIT) AS EmailSent,
    NULL AS EmailSentDate,
    'PICKUP' AS EmailStatus,
    GETDATE() AS CreateDate,
    C.Id AS UserId,
    C.Email AS UserEmail,
    NULL AS Subject
    FROM Result R
    INNER JOIN Job J ON R.JobId = J.Id
    INNER JOIN User C ON J.UserId = C.Id
    WHERE 
    ISNULL(J.Approved, CAST(0 AS BIT)) = CAST(1 AS BIT)
    AND ISNULL(J.Closed, CAST(0 AS BIT)) = CAST(0 AS BIT)
    AND ISNULL(R.Email,'') <> '' -- has an email address
    AND ISNULL(R.EmailSent, CAST(0 AS BIT)) = CAST(0 AS BIT) -- email has not been sent
    AND R.EmailSentDate IS NULL -- email has not been sent
    AND ISNULL(R.EmailStatus,'') = '' -- email has not been sent
    AND ISNULL(R.IsEmailSubscribe, 'True') <> 'False' -- not unsubscribed
    -- not already been emailed for this job
    AND NOT EXISTS (
        SELECT SMTP.Email
        FROM SMTP_Production SMTP
        WHERE SMTP.JobId = R.JobId AND SMTP.CandidateId = R.CandidateId
    )
    -- not unsubscribed
    AND NOT EXISTS (

        SELECT u.Id FROM Unsubscribe u
        WHERE  ISNULL(u.EmailAddress, '') = ISNULL(R.Email, '')

    )
    AND NOT EXISTS (
        SELECT SMTP.Id FROM SMTP_Production SMTP
        WHERE SMTP.EmailStatus = 'PICKUP' AND SMTP.CandidateId = R.CandidateId
    )   
    AND C.Id NOT IN (
        -- list of ids
    )
    AND J.Id NOT IN (
        -- list of ids
    )
    AND J.ClientId NOT IN 
    (
        -- list of ids
    )
)
INSERT INTO smtp_production (ResultId, JobId, CandidateId, Email, EmailSent, EmailSentDate, EmailStatus, CreateDate, ConsultantId, ConsultantEmail, Subject)
OUTPUT INSERTED.ResultId,GETDATE() INTO ResultstoUpdate
SELECT 
    CTE.ResultId,
    CTE.JobId,
    CTE.CandidateId,
    CTE.Email,
    CTE.EmailSent,
    CTE.EmailSentDate,
    CTE.EmailStatus,
    CTE.CreateDate,
    CTE.UserId,
    CTE.UserEmail,
    NULL
FROM CTE
  INNER JOIN 
    (
        SELECT *, row_number() over(partition by CTE.Email, CTE.CandidateId order by CTE.EmailSentDate desc) as rn
        FROM CTE

    ) DCTE ON CTE.ResultId = DCTE.ResultId AND DCTE.rn = 1

请在下面查看我更新后的查询：

WITH CTE AS 
(
    SELECT R.Id AS ResultId,
    r.JobId,
    r.CandidateId,
    R.Email,
    CAST(0 AS BIT) AS EmailSent,
    NULL AS EmailSentDate,
    'PICKUP' AS EmailStatus,
    GETDATE() AS CreateDate,
    C.Id AS UserId,
    C.Email AS UserEmail,
    NULL AS Subject
    FROM RESULTS R
    INNER JOIN JOB J ON R.JobId = J.Id
    INNER JOIN Consultant C ON J.UserId = C.Id
    WHERE 
    J.DCApproved = 1
    AND (J.Closed = 0 OR J.Closed IS NULL)
    AND (R.Email <> '' OR R.Email IS NOT NULL)
    AND (R.EmailSent = 0 OR R.EmailSent IS NULL)
    AND R.EmailSentDate IS NULL -- email has not been sent
    AND (R.EmailStatus = '' OR R.EmailStatus IS NULL)
    AND (R.IsEmailSubscribe = 'True' OR R.IsEmailSubscribe IS NULL)
    -- not already been emailed for this job
    AND NOT EXISTS (
        SELECT SMTP.Email
        FROM SMTP_Production SMTP
        WHERE SMTP.JobId = R.JobId AND SMTP.CandidateId = R.CandidateId
    )
    -- not unsubscribed
    AND NOT EXISTS (

        SELECT u.Id FROM Unsubscribe u
        WHERE (u.EmailAddress = R.Email OR (u.EmailAddress IS NULL AND R.Email IS NULL))

    )
    AND NOT EXISTS (
        SELECT SMTP.Id FROM SMTP_Production SMTP
        WHERE SMTP.EmailStatus = 'PICKUP' AND SMTP.CandidateId = R.CandidateId
    )   
    AND C.Id NOT IN (
        -- LIST OF IDS
    )
    AND J.Id NOT IN (
        -- LIST OF IDS
    )
    AND J.ClientId NOT IN 
    (
        -- LIST OF IDS
    )
)

INSERT INTO smtp_production (ResultId, JobId, CandidateId, Email, EmailSent, EmailSentDate, EmailStatus, CreateDate, UserId, UserEmail, Subject)
OUTPUT INSERTED.ResultId,GETDATE() INTO ResultstoUpdate
SELECT 
    CTE.ResultId,
    CTE.JobId,
    CTE.CandidateId,
    CTE.Email,
    CTE.EmailSent,
    CTE.EmailSentDate,
    CTE.EmailStatus,
    CTE.CreateDate,
    CTE.UserId,
    CTE.UserEmail,
    NULL
FROM CTE
  INNER JOIN 
    (
        SELECT *, row_number() over(partition by CTE.Email, CTE.CandidateId order by CTE.EmailSentDate desc) as rn
        FROM CTE

    ) DCTE ON CTE.ResultId = DCTE.ResultId AND DCTE.rn = 1


GO

【问题讨论】：

ISNULL 在您的WHERE 中会破坏您查询的任何 SARGability。不要在您的WHERE 中使用它，使用格式({Expression} = {Value} or {Expression} IS NULL)
@Larnu 我相信这条评论本身就足以成为一个答案。
@GeorgeMenoutis 你可能是对的。我想先查看查询的其余部分，但看到那是一个非常早的地方。
子句R.Email <> '' OR R.Email IS NOT NULL 有点多余。如果R.Email 的值为''，则它的值为NULL；所以将评估为真（意味着将返回值为'' 的值）。您可能只需要 R.Email <> '' 在这里，因为 NULL <> '' 评估为未知，这是不正确的。
即使现在只是运行 SELECT STATEMENT 例如SELECT CTE.ResultId, etc FROM CTE 导致我的服务器以 100% ram 运行

标签： sql sql-server database database-performance sqlperformance

【解决方案1】：

在WHERE 和JOIN 子句中使用ISNULL 可能是这里的主要原因。对查询中的列使用函数会导致查询变为非 SARGable（这意味着它不能使用表上的任何索引，因此它可以扫描整个事情）。注意;对变量使用函数，在那里WHERE 通常很好。例如WHERE SomeColumn = DATEADD(DAY, @n, @SomeDate)。像WHERE SomeColumn = ISNULL(@Variable,0) 这样的东西有“包罗万象的查询”的味道，所以可以成为性能打击者；取决于你的设置。不过，这不是手头的讨论。

对于像ISNULL(J.Closed, CAST(0 AS BIT)) = CAST(0 AS BIT) 这样的子句，这对于查询优化器来说是一个很大的问题，并且您的查询充满了它们。您需要将这些替换为以下子句：

WHERE (J.Closed = 0 OR J.Closed IS NULL)

虽然没有区别，但也没有必要在 CAST 和 0 那里。 SQL Server 可以看到您正在与bit 进行比较，因此也会将0 解释为一个。

您还有一个带有WHERE 子句ISNULL(u.EmailAddress, '') = ISNULL(R.Email, '') 的EXISTS。这需要变成：

WHERE (u.EmailAddress = R.Email
  OR   (u.EmailAddress IS NULL AND R.Email IS NULL))

您需要在WHERE 子句（CTE 和子查询）中更改所有 ISNULL 用法，您应该会看到可观的性能提升。

【讨论】：

WOW 太棒了，这么全面的答案，真的很感激，现在要实施这个并将发布结果
检查您的查询计划 - 1:48 比我预期的 WORST CASE 多 1:30。
@MatthewStott 很高兴能提供帮助。听起来可能仍有改进的余地（1:48 在很多地方仍然是一个长期运行的查询），但已经有了显着的改进。在问题中添加最新版本的查询；让我们看看我们是否可以进一步改进它。
Larnu - 谢谢，是的，当我运行查询时，我的服务器峰值达到 100% 并且服务器具有良好的资源，我绝对会这样做
您好，我已经用新查询更新了我的帖子，不幸的是我无法显示执行计划，因为当我运行它时，我的服务器以 100% 的 CPU 达到峰值

【解决方案2】：

一般来说，700 万条记录对于现代数据库来说是个笑话。如果你谈论问题，你应该谈论数十亿行的问题，而不是 700 万行。

这表示查询存在问题。高 CPU 通常是不匹配字段的标志（将一个表中的字符串与另一个表中的数字进行比较）或...函数调用过于频繁。长时间运行正常是缺少索引或....不可分割性的标志。你真的很努力。

Non-Sargeability 意味着不能使用索引。示例如下：

ISNULL(J.Approved, CAST(0 AS BIT)) = CAST(1 AS BIT)

ISNULL(field, value) 表示字段上的索引不可用 - 基本上是“goodby index, hello table scan”。这也意味着——嗯....

（J.Approoved = 1 或 J.Approoved IS NULL）

具有相同的含义，但它是 sargeable。几乎你的每一个条件都是以一种不可分割的方式编写的——欢迎来到 db hell。开始重写。

您可能想在 https://www.techopedia.com/definition/28838/sargeable 阅读更多关于 sargeability 的信息

还要确保您在所有相关外键（以及引用的主键）上都有索引 - 否则，再次欢迎表扫描。

【讨论】：

感谢 cmets，我应用了之前发布的内容，效果很好