SQL Server 2008 R2 运行时的聚合产品生成答案

【问题标题】：Aggregated product generation on runtime for SQL Server 2008 R2SQL Server 2008 R2 运行时的聚合产品生成
【发布时间】：2015-08-06 20:50:25
【问题描述】：

我有大量数据。我需要在每个值上实现一个产品聚合。让我用例子来说明清楚。

这是一个示例数据-

/*SampleTable*/
|ID|Date  |Value  |
| 1|201401|25     |
| 1|201402|-30    |
| 1|201403|-15    |
| 1|201404|50     |
| 1|201405|70     |

| 2|201010|1.15   |
| 2|201011|1.79   |
| 2|201012|0.82   |
| 2|201101|1.8    |
| 2|201102|1.67   |

必须做这张桌子-

/*ResultTable*/
|ID|Date  |Aggregated Value  |
| 1|201312|100               |
| 1|201401|125               |
| 1|201402|87.5              |
| 1|201403|74.375            |
| 1|201404|111.563           |
| 1|201405|189.657           |

| 2|201009|100               |
| 2|201010|101.15            |
| 2|201011|102.960           |
| 2|201012|103.804           |
| 2|201101|105.673           |
| 2|201102|107.438           |
-- Note: The 100 values are separately inserted for each ID at the month before first date
-- of previous table

这里对于每个 ID，我都有一个 Value （第 2 列） 给出了相应的 Date （YYYYMM 格式）。我必须执行以下公式来计算按每个 ID 分组的 Aggregated Value 列 -

current_Aggregated_Value = previous_aggregated_value * ((current_value/100) + 1))

对此没有简单的解决方案。我必须取上一行的聚合值，这也是同一个查询生成的值（100除外，它是手动添加的），来计算聚合值当前行。由于无法在运行时为 SQL 获取生成的值，因此我必须实现描述 here 的产品聚合函数。

so 2nd aggregated_value (125) was derived by (100 * ((25 / 100) + 1)) = 125
3rd aggregated_value (87.5) was derived by (125 * ((-30 / 100) + 1)) = 87.5
But as we cannot take the generated '125' value in runtime, I had to take the product aggregate of the all previous value, 100 * ((25 / 100) + 1) * ((-30 / 100) + 1) = 87.5
similarly 4th value (74.375) comes from, 100 * ((25 / 100) + 1) * ((-30 / 100) + 1) * ((-15 / 100) + 1) = 74.375

在下面给出一个示例查询 -

INSERT INTO ResultTable (ID, [Date], [Aggregate Value])
SELECT temps.ID, temps.[Date],
    CASE
       WHEN temps.min_val = 0 THEN 0
       WHEN temps.is_negative % 2 = 1 THEN -1 * EXP(temps.abs_multiplier) * 100
       ELSE EXP(temps.abs_multiplier) * 100
    END AS value

FROM
(
   SELECT st1.ID, st1.[Date],
       -- Multiplication by taking all +ve values
       SUM(LOG(ABS(NULLIF(((st2.Value / 100) + 1), 0)))) AS abs_multiplier,
       -- Count of -ve values, final result is -ve if count is odd
       SUM(SIGN(CASE WHEN ((st2.Value / 100) + 1) < 0 THEN 1 ELSE 0 END)) AS is_negative,
       -- If any value in the multipliers is 0 the whole multiplication result will be 0
       MIN(ABS((st2.Value / 100) + 1)) AS min_val
   FROM SampleTable AS st1
   INNER JOIN SampleTable AS st2 ON (st2.ID = st1.ID AND st2.[Date] <= st1.[Date])
   GROUP BY st1.id, st1.[Date]
) AS temps;

基本上，它会为每个值获取先前日期的所有聚合值的产品聚合，以计算所需的值。好吧，它听起来和看起来一样乱七八糟，而且“h-word”很慢！但是我在 SQL Server 2008 R2 中找不到任何更好的解决此类问题的方法（除非你能给我一个）。

所以，我想知道两件事-
1. 不加入同一张桌子也可以吗？
2. 在SQL Server 2008 R2 上做产品聚合有没有更好的方法？（我知道 Server 2012 中有一种方法，但这不是我的选择）

抱歉，L-O-N-G 问题！但是提前谢谢！

【问题讨论】：

似乎您的联接返回的行数比预期的多得多（检查AND st2.[Date] <= st1.[Date] 条件）。每个ID 应该总是有一行，对吧？你检查过执行计划最慢的部分是什么？
实际上要计算一行，我需要所有先前值的聚合乘积。我必须使用st2.[Date] <= st1.[Date] 部分吗？让我解释一下，
对于第二个值 (125)，计算结果是 100*((25/100)+1)
对于第三个值 (87.5)，计算结果是 125*((-30/100)+1 )。在运行时不可能取 125。所以它必须像 100*((25/100)+1) * ((-30/100)+1)
对于第四个值 (74.375) 它是 100*((25/100)+1 ) * ((-30/100)+1) * ((-15/100)+1)
等等...@Jan Zahradník
从描述看来，第 3 行仅根据第 2 行计算，而不是第 1 行和第 2 行一起计算。结果还表明您仅使用了上个月的值。
在 SQL Server 2012+ 中，您可以使用累积和功能。但是，在 SQL Server 2008 中，我认为任何方法（没有游标）都将具有与您现在所做的相似的性能。
有两种方式，一种简单而缓慢的递归，另一种是使用LOG 和EXP 的技巧，与递归相比，这种技巧并不容易且快速。

标签： sql sql-server-2008 aggregate-functions query-performance

【解决方案1】：

我运行了几份大量使用递归的报告，结果通常非常可接受，而且一点也不慢。试试这个解决方案：

-- http://stackoverflow.com/questions/30437219/aggregated-product-generation-on-runtime-for-sql-server-2008-r2

-- Create temp table to hold sample data
Create table #sampleTable
(   ID int
,   YrMnth date not null
,   CurrentValue numeric(13,3)
);

-- Insert sample data into the temp table
-- Date values have an added '01' at the end to make them compatible with the "date" datatype
insert into #sampleTable
values  (1,'20131201',100)
    ,   (1,'20140101',25)
    ,   (1,'20140201',-30)
    ,   (1,'20140301',-15)
    ,   (1,'20140401',50)
    ,   (1,'20140501',70)
    ,   (2,'20100901',100)
    ,   (2,'20101001',1.15)
    ,   (2,'20101101',1.79)
    ,   (2,'20101201',0.82)
    ,   (2,'20110101',1.8)
    ,   (2,'20110201',1.67);

-- Declare recursive CTE which loads the first values for each ID as the anchor
With CTE 
as 
(
Select      0 as lvl
        ,   minID.ID
        ,   minID.YrMnth
        ,   s.CurrentValue
From    #sampleTable s
        inner join (select  ID
                        ,   min(YrMnth) as 'YrMnth'
                    from    #sampleTable
                    group by ID) as minID
            on s.ID = minID.ID
            and s.YrMnth = minID.YrMnth

union all

-- Add the recursive part which unions on the same ID and +1 month for the date
-- Note that the cast in the calculation is required to prevent datatype errors between anchor and recursive member
select      cte.lvl + 1 as lvl
        ,   CTE.ID
        ,   S2.YrMnth
        ,   cast(CTE.CurrentValue * ((s2.CurrentValue / 100) + 1) as numeric(13,3))
        --, s2.CurrentValue
from    #sampleTable s2
        inner join CTE
            on s2.ID = CTE.ID
            and S2.YrMnth = dateadd(month,1,cte.YrMnth)

)

-- Select final result set
Select      *
from        CTE
order by    ID
        ,   YrMnth
        ,   lvl;

-- Clean up temp table
drop table #sampleTable;

我必须在您的日期值中添加一天部分，以便将它们视为日期数据类型。这允许您在“month + 1”上加入递归成员。我添加了“lvl”列只是为了检查递归结果，但我把它留在里面，因为它有助于查看特定记录经历了多少次递归。这将取决于您的总数据大小，这将运行多快，但我很确定它会比您的原始解决方案更快。请注意，此解决方案假定您的日期对于给定 ID 是连续的，并且没有丢失月份。

【讨论】：