【问题标题】:How to roll up based on a few criteria in SQL如何根据 SQL 中的一些条件进行汇总
【发布时间】:2017-02-14 20:34:36
【问题描述】:

我有一个这样的数据表:

QuestionID    UserName    UserWeightingForQuestion    AnswerGivenForQuestion    Metric
1             A           1.50                        1                         ToBeCalculated
1             B           1.00                        2                         ToBeCalculated
1             C           1.80                        3                         ToBeCalculated
1             D           1.20                        1                         ToBeCalculated
1             E           1.40                        2                         ToBeCalculated
2             A           1.20                        2                         ToBeCalculated
2             B           1.20                        2                         ToBeCalculated
2             C           1.10                        4                         ToBeCalculated
2             D           1.20                        5                         ToBeCalculated
...

对于每个问题组,我想用如下所示定义的计算值填充Metric 列下的每个单元格:

Metric_For_User_A_For_QuestionID_X = SUM(Weights_With_The_Answer_Similar_To_What_Is_Given_By_User_A_In_QuestionID_Group = X) / DISTINCT(All_WEeights_In_One_QuestionID_Group = X)

具体来说,

Metric_For_User_A_For_QuestionID_1 = SUM(1.50+1.20)/(1.50+1.00+1.80+1.20+1.40)
Metric_For_User_B_For_QuestionID_1 = SUM(1.00+1.40)/(1.50+1.00+1.80+1.20+1.40)
Metric_For_User_C_For_QuestionID_1 = SUM(1.80)/(1.50+1.00+1.80+1.20+1.40)
Metric_For_User_D_For_QuestionID_1 = SUM(1.50+1.20)/(1.50+1.00+1.80+1.20+1.40)
Metric_For_User_E_For_QuestionID_1 = SUM(1.00+1.40)/(1.50+1.00+1.80+1.20+1.40)

对于 QuestionID group = 2,我想重复上述过程。例如,

Metric_For_User_A_For_QuestionID_2 = SUM(1.20+1.20)/(1.20+1.10)

我对 SQL 还很陌生,我相信可以使用 OVER 或某种聚合函数来实现这一点(?)如果这种计算在 SQL 中是可能的,有 SQL 专业知识的人可以建议我实现我想要计算的方法。

原始表有大约 70m 行,我使用的是 SQL Server。非常感谢您的建议和回答!

【问题讨论】:

  • 你能解释一下这个逻辑吗?你想做什么对我来说没有任何意义。
  • @SeanLange 一个朋友要求我解决这个问题,所以我不确定他为什么要这样做(我问了他同样的问题)。我试图自己解决这个问题,但最终意识到,目前要高效地完成它已经超出了我的 SQL 能力。

标签: sql sql-server aggregate-functions aggregates


【解决方案1】:

您可以使用SUM 窗口函数来执行此操作。

select t.*,
sum(UserWeightingForQuestion) over(partition by questionID,AnswerGivenForQuestion)
/sum(UserWeightingForQuestion) over(partition by questionID) as metric
from tablename t
  • sum(UserWeightingForQuestion) over(partition by questionID) 获取每个 questionID 的所有 UserWeightingForQuestion 的总和

  • sum(UserWeightingForQuestion) over(partition by questionID,AnswerGivenForQuestion) 总结每个 questionID 的相似 UserWeightingForQuestion

编辑:要总结分母中每个 questionID 的 不同 权重,请使用

select t.*,
sum(UserWeightingForQuestion) over(partition by questionID,AnswerGivenForQuestion)
/(select sum(distinct UserWeightingForQuestion) from tablename where t.questionID=questionID) as metric
from tablename t

【讨论】:

  • 正如你所说,你得到“每个 questionID 的所有 UserWeightingForQuestion 的总和”,但是 OP 要求所有 distinct 权重的总和每个问题 ID。
  • @RBarryYoung ..你是对的。我错过了那部分。查看编辑。
  • @vkp 感谢您的简洁回答!这就是我需要的。 :)
【解决方案2】:
declare @quest table(QuestionID int
                     , UserName varchar(20)
                     , UserWeightingForQuestion decimal(10,2)
                     , AnswerGivenForQuestion int);
insert into @quest values
(1,'A',1.50,1),(1,'B',1.00,2),(1,'C',1.80,3),(1,'D',1.20,1),
(1,'E',1.40,2),(2,'A',1.20,2),(2,'B',1.20,2),(2,'C',1.10,4),(2,'D',1.20,5);

Baicaly 你做了两个分区,一个按 QuestionID 和 AnswerGivenForQuestion,另一个按 QuestionID。

WITH CALC AS
(
    SELECT Q2.QuestionID, Q2.UserName, 
           SUM(UserWeightingForQuestion) OVER (PARTITION BY QuestionID, AnswerGivenForQuestion) AS Weight,
           (SELECT SUM(DISTINCT Q1.UserWeightingForQuestion)
            FROM @quest Q1
            WHERE Q1.QuestionID = Q2.QuestionID) AS AllWeights
    FROM @quest Q2
)
SELECT QuestionID, UserName, Weight, AllWeights, 
       CAST(Weight / AllWeights AS DECIMAL(18,2)) as Metric
FROM CALC
ORDER BY QuestionID, UserName;

+------------+----------+--------+------------+--------+
| QuestionID | UserName | Weight | AllWeights | Metric |
+------------+----------+--------+------------+--------+
|      1     |     A    |  2,70  |    6,90    |  0,39  |
|      1     |     B    |  2,40  |    6,90    |  0,35  |
|      1     |     C    |  1,80  |    6,90    |  0,26  |
|      1     |     D    |  2,70  |    6,90    |  0,39  |
|      1     |     E    |  2,40  |    6,90    |  0,35  |
+------------+----------+--------+------------+--------+
|      2     |     A    |  2,40  |    2,30    |  1,04  |
|      2     |     B    |  2,40  |    2,30    |  1,04  |
|      2     |     C    |  1,10  |    2,30    |  0,48  |
|      2     |     D    |  1,20  |    2,30    |  0,52  |
+------------+----------+--------+------------+--------+

【讨论】:

  • 您的 AllWeights 列应该是对 distinct 值的总和,而不是所有值。
  • @McNets 非常感谢您写得非常干净(至少在我看来)并且易于理解的答案。我接受了上面vkp的提交,因为他的回答来得早。
猜你喜欢
  • 2021-06-28
  • 2019-06-12
  • 1970-01-01
  • 2016-12-30
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多