在 Postgres 中，如何获得具有组最大值的子分组？答案

【问题标题】：In Postgres, how can I get the sub-grouping with the max value for a group?在 Postgres 中，如何获得具有组最大值的子分组？
【发布时间】：2015-03-15 04:53:57
【问题描述】：

在 Postgres 中，我有一个地铁系统的历史表，其结构如下：

CREATE TABLE stop_history
(
    stop_id character varying,
    route_id character varying,
    next_stop_id character varying
);

我想弄清楚：对于停靠点和路线，最常见的下一站是什么？

我需要做的是：按停靠点、路线和下一站分组，并获取这些组的计数。对于这些组中的每一个，获取每个 stop_id 和 route_id 组合中计数最高的组。

我将如何编写这样的 postgres 查询，我应该在该表上放置哪些索引以最大限度地提高性能？

我遇到的一个挑战是无法在 where 子句中使用 count(*) 或 max(count(*))。

有样本数据：

INSERT INTO stop_history VALUES ('101N', '1', NULL);
INSERT INTO stop_history VALUES ('102N', '1', '101N');
INSERT INTO stop_history VALUES ('103N', '1', '102N');
INSERT INTO stop_history VALUES ('104N', '1', '103N');
INSERT INTO stop_history VALUES ('104N', '1', '103N');
INSERT INTO stop_history VALUES ('104N', '1', '102N');
INSERT INTO stop_history VALUES ('104N', '1', '103N');
INSERT INTO stop_history VALUES ('104N', '1', '102N');
INSERT INTO stop_history VALUES ('101N', 'D', NULL);
INSERT INTO stop_history VALUES ('102N', 'D', '101N');
INSERT INTO stop_history VALUES ('102N', 'D', '101N');
INSERT INTO stop_history VALUES ('102N', 'D', NULL);

预期输出是：

Stop | Route | Most common Next Stop | Frequency
101N 1 NULL 1
102N 1 101N 1
103N 1 102N 1
104N 1 103N 3
101N D NULL 1
102N D 101N 2

【问题讨论】：

对聚合函数条件使用 HAVING 子句！
你能写一个查询吗：）？
您当前的查询看起来如何？
请添加一些示例数据（请格式化文本）和基于该示例数据的预期输出。理想情况下，所有内容都为 insert 声明，包括必要的 create table 声明
@a_horse_with_no_name 添加了创建和示例插入以及预期输出

标签： sql postgresql aggregate-functions

【解决方案1】：

类似这样的：

select distinct on (stop_id, route_id) stop_id, 
       route_id, 
       coalesce(next_stop_id, 'NULL'), 
       count(*) over (partition by route_id, stop_id, coalesce(next_stop_id, 'NULL')) as frequency
from stop_history
order by route_id, stop_id, frequency desc

窗口函数 (count(*) over (...)) 计算 next_stop_id 列的频率。

然后使用（Postgres）特定的distinct on() 将结果减少到仅具有最高频率的结果（这是通过最终的order by ... frequence DESC 实现的）

SQLFiddle：http://sqlfiddle.com/#!15/66ff6/1

【讨论】：