SQL/Presto：使用多个条件进行排名（一个条件是检查行是否在同一个标签中）答案

【问题标题】：SQL/Presto: rank with multiple conditions (one condition is checking if rows is in the same tag)SQL/Presto：使用多个条件进行排名（一个条件是检查行是否在同一个标签中）
【发布时间】：2021-07-27 04:30:16
【问题描述】：

我有如下数据 (dt)：

  group_id    customer_id  tag     score   phase
  1           a             l1     0       2020
  1           b             l2     0       2021
  2           a             l4     1       2019
  2           e             l3     1       2019
  2           d             l3     1       2018
  3           q                    1       2020
  3           w                    1       2019
  3           z             l5     1       2019
  3           x             l5     1       2019
  3           c             l6     1       2019

我想要

在一个组中排名，首先按分数（分数越低越好）
如果 2 个客户在同一组中且得分相同且标签相同（非空），则连接 customer_id
我需要按阶段对它们进行排名（首选较旧的阶段）以生成最终列表。

所以，期望的输出是：

 group_id    customer_id   tag     score     phase          rank
  1           a             l1     0         2020           1    
  1           b             l2     0         2021           2
  2           a             l4     1         2019           2
  2           e,d           l3     1        2019, 2018      1
  3           q                    1         2020           2
  3           w                    1         2019           1
  3           z,x           l5     1       2019, 2019       1
  3           c             l6     1       2019             1

我已经编写了以下查询，但我不确定如何合并检查标签中是否有 2 个客户的条件，并继续添加相位比较条件。

  SELECT group_id, customer_id, tag, score, phase, 
  RANK() OVER (PARTITION BY group_id ORDER BY score) AS temp_rank
  FROM dt

【问题讨论】：

连接 e 和 d，因为它们在同一个组中并且具有相同的标签。改错了

标签： sql database presto

【解决方案1】：

我使用了ROW_NUMBER() 而不是RANK()。以下查询返回您想要的输出。我用 sql 测试了这个查询。在一个内置子查询中，我按tag 分组，并使用STRING_AGG() 函数将相同的值放在一个列中。我将结果加入到主表中。

select t.group_id,t.customer_id,t.tag,t.score,t.phase,ROW_NUMBER() OVER (PARTITION BY t.group_id ORDER BY t.score) AS rank
from
  (select distinct t1.*,t2.group_id,t2.score
  from
     (SELECT tag, STRING_AGG(customer_id, ',') AS customer_id,STRING_AGG(phase, ',') AS phase
      FROM dt
      group by tag) t1 join dt t2 on t1.tag = t2.tag) t
order by t.group_id,t.customer_id

查询结果：https://dbfiddle.uk/sql

postgresql 的结果：https://dbfiddle.uk/postgresql

【讨论】：

【解决方案2】：

对于带有非 NULL 标记的行，您必须使用 group by group_id, tag, score 和 array_join(array_agg()) 连接 customer_ids 和 phases。
然后对带有 NULL 标记的行使用 UNION ALL。
然后使用RANK()代替DENSE_RANK()窗口函数：

SELECT group_id, customer_id, tag, score, phase,
       DENSE_RANK() OVER (PARTITION BY group_id ORDER BY score, min_phase) temp_rank
FROM (
  SELECT group_id, 
         array_join(array_agg(customer_id), ',') customer_id, 
         tag, 
         score, 
         array_join(array_agg(phase), ',') phase,
         MIN(phase) min_phase
  FROM dt
  WHERE tag IS NOT NULL
  GROUP BY group_id, tag, score
  UNION ALL
  SELECT group_id, customer_id, tag, score, phase, phase
  FROM dt
  WHERE tag IS NULL
) t

请参阅demo（适用于 Postgresql）。
结果：

group_id	customer_id	tag	score	phase	temp_rank
1	a	l1	0	2020	1
1	b	l2	0	2021	2
2	e,d	l3	1	2019,2018	1
2	a	l4	1	2019	2
3	w	null	1	2019	1
3	c	l6	1	2019	1
3	z,x	l5	1	2019,2019	1
3	q	null	1	2020	2

【讨论】：

这个查询会对那些没有标签的行做什么？
@lll GROUP BY group_id, tag, score 创建 3 列的独特组合。一个组合可能有null 作为标签。
我只想在标签不为空时连接案例。当我刚才尝试查询时，如果标签为空，则将它们连接起来
@lll 发布带有空标签的示例数据以阐明您想要什么。
添加了一个NULL标签案例