高效的 SQL Server 存储过程答案

【问题标题】：Efficient SQL Server stored procedure高效的 SQL Server 存储过程
【发布时间】：2013-03-20 18:23:19
【问题描述】：

我正在使用 SQL Server 2008 并运行以下存储过程，该存储过程需要将 70 mill 表从大约 50 mill 行“清理”到另一个表，id_col 是 integer（主标识键）

根据我上次运行的结果，它运行良好，但预计会持续大约 200 天：

SET NOCOUNT ON

    -- define the last ID handled
    DECLARE @LastID integer
    SET @LastID = 0
    declare @tempDate datetime
    set @tempDate = dateadd(dd,-20,getdate())
    -- define the ID to be handled now
    DECLARE @IDToHandle integer
    DECLARE @iCounter integer
    DECLARE @watch1 nvarchar(50)
    DECLARE @watch2 nvarchar(50)
    set @iCounter = 0
    -- select the next  to handle    
    SELECT TOP 1 @IDToHandle = id_col
    FROM MAIN_TABLE
    WHERE id_col> @LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
        and datediff(dd,someDateCol,@tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
    ORDER BY id_col

    -- as long as we have s......    
    WHILE @IDToHandle IS NOT NULL
    BEGIN
        IF ((select count(1) from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS where some_int_col = @IDToHandle) = 0 and (select count(1) from A_70k_rows_table where some_int_col =@IDToHandle )=0)
        BEGIN
            INSERT INTO SECONDERY_TABLE
            SELECT col1,col2,col3.....
            FROM MAIN_TABLE WHERE id_col = @IDToHandle

            EXEC    [dbo].[DeleteByID] @ID = @IDToHandle --deletes the row from 2 other tables that is related to the MAIN_TABLE and than from the MAIN_TABLE
            set @iCounter = @iCounter +1
        END
        IF (@iCounter % 1000 = 0)
        begin
            set @watch1 = 'iCounter - ' + CAST(@iCounter AS VARCHAR)
            set @watch2 = 'IDToHandle - '+ CAST(@IDToHandle AS VARCHAR)
            raiserror ( @watch1, 10,1) with nowait
            raiserror (@watch2, 10,1) with nowait
        end
        -- set the last  handled to the one we just handled
        SET @LastID = @IDToHandle
        SET @IDToHandle = NULL

        -- select the next  to handle    
        SELECT TOP 1 @IDToHandle = id_col
        FROM MAIN_TABLE
        WHERE id_col> @LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
            and datediff(dd,someDateCol,@tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
        ORDER BY id_col
    END

欢迎任何改进此程序运行时的想法或指导

【问题讨论】：

嗯...这似乎是一种非常程序化的方法来解决应该基于集合的问题。您需要停止逐行思考，而是利用 Sql 的能力非常有效地处理数据集。
您对所涉及的表有任何触发器吗？如果你这样做了，这些触发器可能会让一切都花费更长的时间。
它看起来唯一迫使你这样做row-by-agonizing-row 是DeleteByID 存储过程...你能包括这个存储过程的定义，所以它可以合并到一个基于集合的解决方案？

标签： sql sql-server database sql-server-2008 stored-procedures

【解决方案1】：

是的，试试这个：

Declare @Ids Table (id int Primary Key not Null)
Insert @Ids(id)
Select id_col
From MAIN_TABLE m
Where someDateCol >= otherDateCol
    And someDateCol < @tempDate -- If there are times in these datetime fields, 
                                -- then you may need to modify this condition.
    And some_other_int_col In (1745, 1548, 4785)
    And Not exists (Select * from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS
                    Where some_int_col = m.id_col)
    And Not Exists (Select * From A_70k_rows_table
                    Where some_int_col = m.id_col)
Select id from @Ids  -- this to confirm above code generates the correct list of Ids
return -- this line to stop (Not do insert/deletes) until you have verified @Ids is correct
-- Once you have verified that above @Ids is correctly populated, 
-- then delete or comment out the select and return lines above so insert runs.

      Begin Transaction
      Delete OT     -- eliminate row-by-row call to second stored proc
      From OtherTable ot
         Join MAIN_TABLE m On m.id_col = ot.FKCol
         Join @Ids i On i.Id = m.id_col 

      Insert SECONDERY_TABLE(col1, col2, etc.)
      Select col1,col2,col3.....
      FROM MAIN_TABLE m Join @Ids i On i.Id = m.id_col 

      Delete m   -- eliminate row-by-row call to second stored proc
      FROM MAIN_TABLE m 
      Join @Ids i On i.Id = m.id_col 

      Commit Transaction

解释。

您有许多不可 SARGable 的过滤条件，即它们会强制对循环的每次迭代进行完整的表扫描，而不是能够使用任何现有索引。在将表列值与其他值进行比较之前，请始终尝试避免将处理逻辑应用于表列值的过滤条件。这消除了查询优化器使用索引的机会。
您一次执行一个插入...最好生成一个需要处理的 PK Id 列表（一次全部），然后在一个语句中一次执行所有插入。

【讨论】：

嗯，看起来不错，我正在运行检查，如果一切正常，我会发回，无论如何，非常感谢
好吧，我首先尝试针对少量信息运行此程序（在一个循环中，我为每 100000 行检查了该过程，但它使我的数据库不可用，之后我尝试了每 1000 行和仍然使它在几秒钟内无法访问 SECONDERY_TABLE 直到和其他相关表。只有重新启动服务才能清除它...
我可以从这里实现一些东西吗：stackoverflow.com/questions/15480699/…
martin，如果在 Begin Transaction 之后过程失败，它会在表上留下锁，阻止访问，解释你的症状。在您弄清楚代码失败的原因之前，请注释掉 Begin Transaction，或者，如果再次发生，请在查询窗口中键入 RollBack Transaction 以重新获得对锁定表的访问权限。
如果您以 1000 条记录的块（或任何大小）编写代码，请确保每次迭代在迭代块的开头包含一个 Begin Transaction，在末尾包含一个 commit transaction , 或 Rollback Transaction 如果迭代失败。