在 MySQL 中查找每个组的最高 n 值答案

【问题标题】：Finding the highest n values of each group in MySQL在 MySQL 中查找每个组的最高 n 值
【发布时间】：2011-07-02 07:19:20
【问题描述】：

我有一些数据格式如下：

Lane         Series
1            680
1            685
1            688
2            666
2            425
2            775
...

我想获取每条车道上最高的 n 系列（为了这个例子，假设是 2，但它可能不止于此）

所以输出应该是：

Lane         Series
1            688
1            685
2            775
2            666

获得每个车道最高的系列很容易，但我似乎无法找到获得最高 2 个结果的方法。

我使用带有 GROUP BY 的 MAX 聚合函数来获取 MAX，但是没有像 SQL Server 中那样的“TOP N”函数，并且使用 ORDER BY... LIMIT 只返回总体上最高的 N 个结果，而不是每个通道。

由于我使用 JAVA 应用程序，我自己编写代码来查询数据库并选择 N 是多少，我可以执行循环并使用 LIMIT 并循环遍历每个通道，每次都进行不同的查询，但我想了解如何使用 MySQL 来完成。

【问题讨论】：

这在 SQL Server 中使用 Partition/rank 非常简单。这是关于如何使用 MySQL 实现相同功能的类似问题：stackoverflow.com/questions/3333665/mysql-rank-function

标签： mysql greatest-n-per-group

【解决方案1】：

请参阅我的其他答案，了解 MySQL-only 但非常快的解决方案。

此解决方案允许您为每个通道指定任意数量的顶部行，并且不使用任何 MySQL“时髦”语法 - 它应该在大多数数据库上运行。

select lane, series
from lane_series ls
group by lane, series
having (
    select count(*) 
    from lane_series
    where lane = ls.lane
    and series > ls.series) < 2 -- Here's where you specify the number of top rows
order by lane, series desc;

测试输出：

create table lane_series (lane int, series int);

insert into lane_series values 
(1, 680),
(1, 685),
(1, 688),
(2, 666),
(2, 425),
(2, 775);

select lane, series
from lane_series ls
group by lane, series
having (select count(*) from lane_series where lane = ls.lane and series > ls.series) < 2
order by lane, series desc;

+------+--------+
| lane | series |
+------+--------+
|    1 |    688 |
|    1 |    685 |
|    2 |    775 |
|    2 |    666 |
+------+--------+
4 rows in set (0.00 sec)

【讨论】：

我在示例表上测试了您的示例，它的工作原理与应有的完全一样。不过，我有一个问题，因为它在我的实际桌子上不起作用。我很确定这是因为返回我正在寻找的数据需要很长时间，因为现在，包含数据的表有 50,000 行和 8 列（其中一些有字符串）。您知道为什么它适用于示例表，但不适用于实际表吗？
@Adam：好的 - 请参阅我的其他答案，了解适用于非常大表的快速解决方案

【解决方案2】：

这个解决方案对于 MySQL 来说是最快的，并且适用于非常大的表，但它使用“时髦”的 MySQL 功能，因此不适用于其他数据库风格。

（编辑以在应用逻辑之前进行排序）

set @count:=-1, @lane:=0; 
select lane, series
from (select lane, series from lane_series order by lane, series desc) x
where if(lane != @lane, @count:=-1, 0) is not null
and if(lane != @lane, @lane:=lane, lane) is not null
and (@count:=@count+1) < 2; -- Specify the number of row at top of each group here

要使这个查询更有效，请在通道和系列上定义一个索引：CREATE INDEX lane_series_idx on lane_series(lane, series);，它会执行（超快）仅索引扫描 - 所以您的其他文本列不会影响它。

这个查询的优点是：

它只需要一个表传递（尽管已排序）
它处理任何级别的平局，例如，如果第 2 次出现平局，则只会显示第 2 次的一个 - 即行数是绝对的，永远不会超过

这是测试输出：

create table lane_series (lane int, series int);

insert into lane_series values (1, 680),(1, 685),(1, 688),(2, 666),(2, 425),(2, 775);

-- Execute above query:

+------+--------+
| lane | series |
+------+--------+
|    1 |    688 |
|    1 |    685 |
|    2 |    775 |
|    2 |    666 |
+------+--------+

【讨论】：

太好了，这行得通，只是它返回最低的两个值（只需查看示例中的输出）。我用 MySQL 对其进行了测试，它运行良好。我现在只需要将它适应 SQLite，以便在另一台工作的计算机上使用离线版本，我认为它的语法与 MySQL 相同。感谢您的示例，它应该很容易适应，再次感谢！如果您可以更改它以使其返回前两个值而不是最后两个值，我会将您的答案标记为已接受。
@Adam：好的，现在已修复。我需要对逻辑进行before 排序，因此我使用了别名查询。干杯。
太棒了！我必须研究它才能知道它实际上是如何工作的，但它确实有效！再次感谢！

【解决方案3】：

如果您知道自己永远不会获得第一名，这将起作用：

SELECT lane,MAX(series)
FROM scores
GROUP BY lane
UNION 
SELECT s.lane,MAX(s.series)
FROM scores AS s
JOIN (
    SELECT lane,MAX(series) AS series
    FROM scores
    GROUP BY lane
) AS x ON (x.lane = s.lane)
WHERE s.series <> x.series
GROUP BY s.lane;

【讨论】：

这不行，因为我有时并列第一。还是谢谢！

【解决方案4】：

我认为@Bohemian 的通用答案也可以写成连接而不是子查询，尽管它可能没有太大区别：

select ls1.lane, ls1.series
from lane_series ls1 left join lane_series ls2 on lane
where ls1.series < ls2.series
group by ls1.lane, ls1.series
having count(ls2.series) < 2 -- Here's where you specify the number of top rows
order by ls1.lane, ls1.series desc;

【讨论】：