【问题标题】:Efficient SQL Server stored procedure高效的 SQL Server 存储过程
【发布时间】:2013-03-20 18:23:19
【问题描述】:

我正在使用 SQL Server 2008 并运行以下存储过程,该存储过程需要将 70 mill 表从大约 50 mill 行“清理”到另一个表,id_colinteger(主标识键)

根据我上次运行的结果,它运行良好,但预计会持续大约 200 天:

SET NOCOUNT ON

    -- define the last ID handled
    DECLARE @LastID integer
    SET @LastID = 0
    declare @tempDate datetime
    set @tempDate = dateadd(dd,-20,getdate())
    -- define the ID to be handled now
    DECLARE @IDToHandle integer
    DECLARE @iCounter integer
    DECLARE @watch1 nvarchar(50)
    DECLARE @watch2 nvarchar(50)
    set @iCounter = 0
    -- select the next  to handle    
    SELECT TOP 1 @IDToHandle = id_col
    FROM MAIN_TABLE
    WHERE id_col> @LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
        and datediff(dd,someDateCol,@tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
    ORDER BY id_col

    -- as long as we have s......    
    WHILE @IDToHandle IS NOT NULL
    BEGIN
        IF ((select count(1) from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS where some_int_col = @IDToHandle) = 0 and (select count(1) from A_70k_rows_table where some_int_col =@IDToHandle )=0)
        BEGIN
            INSERT INTO SECONDERY_TABLE
            SELECT col1,col2,col3.....
            FROM MAIN_TABLE WHERE id_col = @IDToHandle

            EXEC    [dbo].[DeleteByID] @ID = @IDToHandle --deletes the row from 2 other tables that is related to the MAIN_TABLE and than from the MAIN_TABLE
            set @iCounter = @iCounter +1
        END
        IF (@iCounter % 1000 = 0)
        begin
            set @watch1 = 'iCounter - ' + CAST(@iCounter AS VARCHAR)
            set @watch2 = 'IDToHandle - '+ CAST(@IDToHandle AS VARCHAR)
            raiserror ( @watch1, 10,1) with nowait
            raiserror (@watch2, 10,1) with nowait
        end
        -- set the last  handled to the one we just handled
        SET @LastID = @IDToHandle
        SET @IDToHandle = NULL

        -- select the next  to handle    
        SELECT TOP 1 @IDToHandle = id_col
        FROM MAIN_TABLE
        WHERE id_col> @LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
            and datediff(dd,someDateCol,@tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
        ORDER BY id_col
    END

欢迎任何改进此程序运行时的想法或指导

【问题讨论】:

  • 嗯...这似乎是一种非常程序化的方法来解决应该基于集合的问题。您需要停止逐行思考,而是利用 Sql 的能力非常有效地处理数据集。
  • 您对所涉及的表有任何触发器吗?如果你这样做了,这些触发器可能会让一切都花费更长的时间。
  • 它看起来唯一迫使你这样做row-by-agonizing-rowDeleteByID 存储过程...你能包括这个存储过程的定义,所以它可以合并到一个基于集合的解决方案?

标签: sql sql-server database sql-server-2008 stored-procedures


【解决方案1】:

是的,试试这个:

Declare @Ids Table (id int Primary Key not Null)
Insert @Ids(id)
Select id_col
From MAIN_TABLE m
Where someDateCol >= otherDateCol
    And someDateCol < @tempDate -- If there are times in these datetime fields, 
                                -- then you may need to modify this condition.
    And some_other_int_col In (1745, 1548, 4785)
    And Not exists (Select * from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS
                    Where some_int_col = m.id_col)
    And Not Exists (Select * From A_70k_rows_table
                    Where some_int_col = m.id_col)
Select id from @Ids  -- this to confirm above code generates the correct list of Ids
return -- this line to stop (Not do insert/deletes) until you have verified @Ids is correct
-- Once you have verified that above @Ids is correctly populated, 
-- then delete or comment out the select and return lines above so insert runs.

      Begin Transaction
      Delete OT     -- eliminate row-by-row call to second stored proc
      From OtherTable ot
         Join MAIN_TABLE m On m.id_col = ot.FKCol
         Join @Ids i On i.Id = m.id_col 

      Insert SECONDERY_TABLE(col1, col2, etc.)
      Select col1,col2,col3.....
      FROM MAIN_TABLE m Join @Ids i On i.Id = m.id_col 

      Delete m   -- eliminate row-by-row call to second stored proc
      FROM MAIN_TABLE m 
      Join @Ids i On i.Id = m.id_col 

      Commit Transaction

解释。

  1. 您有许多不可 SARGable 的过滤条件,即它们会强制对循环的每次迭代进行完整的表扫描,而不是能够使用任何现有索引。在将表列值与其他值进行比较之前,请始终尝试避免将处理逻辑应用于表列值的过滤条件。这消除了查询优化器使用索引的机会。

  2. 您一次执行一个插入...最好生成一个需要处理的 PK Id 列表(一次全部),然后在一个语句中一次执行所有插入。

【讨论】:

  • 嗯,看起来不错,我正在运行检查,如果一切正常,我会发回,无论如何,非常感谢
  • 好吧,我首先尝试针对少量信息运行此程序(在一个循环中,我为每 100000 行检查了该过程,但它使我的数据库不可用,之后我尝试了每 1000 行和仍然使它在几秒钟内无法访问 SECONDERY_TABLE 直到和其他相关表。只有重新启动服务才能清除它...
  • 我可以从这里实现一些东西吗:stackoverflow.com/questions/15480699/…
  • martin,如果在 Begin Transaction 之后过程失败,它会在表上留下锁,阻止访问,解释你的症状。在您弄清楚代码失败的原因之前,请注释掉 Begin Transaction,或者,如果再次发生,请在查询窗口中键入 RollBack Transaction 以重新获得对锁定表的访问权限。
  • 如果您以 1000 条记录的块(或任何大小)编写代码,请确保每次迭代在迭代块的开头包含一个 Begin Transaction,在末尾包含一个 commit transaction , 或 Rollback Transaction 如果迭代失败。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-03-02
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多