窗口函数问题 - 分区最大答案

【问题标题】：Window function issue - max over partition窗口函数问题 - 分区最大
【发布时间】：2014-06-03 09:23:06
【问题描述】：

我尝试使用外连接和 max/count/... over partition 将此类 SQL 语句（具有许多子查询）重写为更有效的形式。旧语句：

select a.ID,
     (select max(b.valA) from something b where a.ID = b.ID_T and b.status != 0),
     (select max(b.valB) from something b where a.ID = b.ID_T and b.status != 0),
     (select max(b.valC) from something b where a.ID = b.ID_T and b.status != 0),
     (select max(b.valD) from something b where a.ID = b.ID_T)
from tool a;

这里重要的是 - max(b.valD) 有不同的条件。首先我没有注意到这种差异并写了这样的东西：

select distinct a.ID,
      max(b.valA) over (partition by b.ID_T),
      max(b.valB) over (partition by b.ID_T),
      max(b.valC) over (partition by b.ID_T),
      max(b.valD) over (partition by b.ID_T),
from tool a, 
     (select * from something
     where status != 0) b
where a.ID = b.ID_T(+);

我可以在 max over partition 的某处使用这种 b.status != 0 的条件吗？或者我应该更好地添加第三个表加入这样的：

select distinct a.ID,
      max(b.valA) over (partition by b.ID_T),
      max(b.valB) over (partition by b.ID_T),
      max(b.valC) over (partition by b.ID_T),
      max(c.valD) over (partition by c.ID_T),
from tool a, 
     (select * from something 
      where status != 0) b, 
     something c
where a.ID = b.ID_T(+)
     and a.ID = c.ID_T(+);

问题在于选择和连接数百万行，我的示例只是简化了我的查询。谁能帮我实现更高效的sql？

【问题讨论】：

标签： sql oracle window-functions

【解决方案1】：

您可以尝试使用CASE：

select a.ID,
       max(CASE WHEN b.status=0 THEN b.valA END),
       max(CASE WHEN b.status=0 THEN b.valB END),
       max(CASE WHEN b.status=0 THEN b.valC END),
       max(b.valD)
  from tool a
  left join something b ON( b.ID_T = a.ID )
  group by a.ID;

请注意，为了更好的可读性，我将您的隐式连接替换为“新”连接语法。

【讨论】：

谢谢！ MAX 函数中的 CASE 解决了问题 :) 左连接和带 (+) 的隐式连接在 oracle pl/sql 中是相同的。在这种情况下 group by 是错误的，但这不是问题。
抱歉，GROUP BY 是必需的，但 PARTITION 不是必需的。否则，您将在something 每个ID 中获得具有相同值的一行 - 编辑了我的答案。
也许我的陈述的这种简化并没有显示出来，但在我的情况下这是必要的。这就是为什么我在partition by 中使用select distinct 和更复杂的条件——不同的列是不同的。
现在看起来像select distinct a.ID, max(CASE WHEN b.status=0 THEN b.valA END) over (partition by b.ID_T), max(CASE WHEN b.status=0 THEN b.valB END) over (partition by b.ID_T), max(CASE WHEN b.status=0 THEN b.valC END) over (partition by b.ID_T), max(b.valD) over (partition by b.ID_T, b.other_which_i_didnt_mentioned), from tool a, something b where a.ID = b.ID_T(+);

【解决方案2】：

还有一种方法是使用JOIN和group by subquery：

select a.ID,
     b.MAX_A,
     b.MAX_B,
     b.MAX_C,
     b2.MAX_D 
from tool a
LEFT JOIN
 (
    SELECT ID_T,max(valA) MAX_A, max(valB) MAX_B, max(valC) MAX_C
    FROM something 
    WHERE status != 0
    GROUP BY ID_T     
  ) b
  ON a.ID=b.ID_T
LEFT JOIN
 (
    SELECT ID_T, max(valD) MAX_D
    FROM something 
    GROUP BY ID_T     
  ) b2
  ON a.ID=b2.ID_T

【讨论】：