以伪随机顺序选择表记录答案

【问题标题】：Selecting table records in a pseudo random order以伪随机顺序选择表记录
【发布时间】：2016-03-21 23:00:40
【问题描述】：

在我的情况下，我使用的是嵌入式 H2 数据库，但我的问题实际上是通用 SQL 问题。

考虑这张表，其中一条记录可能引用或不引用另一条记录，并且永远不会从多个位置引用同一记录。

CREATE TABLE test (id NUMBER, data VARCHAR, reference NUMBER) ;
INSERT INTO test (id, data) 
SELECT x, 'P'||x FROM system_range(0, 9);
UPDATE test SET reference = 2 where id = 4;
UPDATE test SET reference = 4 where id = 6;
UPDATE test SET reference = 1 where id = 7;
UPDATE test SET reference = 8 where id = 9;

SELECT * FROM test ORDER BY id;

ID  DATA    REFERENCE
----------------------------------
0   P0      null 
1   P1      null 
2   P2      null 
3   P3      null 
4   P4      2
5   P5      null 
6   P6      4 
7   P7      1 
8   P8      null 
9   P9      8

现在我想要一个 SQL 以随机顺序选择测试记录，唯一的限制是在引用它的记录之前永远不会选择引用的记录。

SELECT * FROM test ORDER BY reference, RAND() 会起作用，但对我来说这似乎不够随机，因为它总是首先选择所有未引用的记录，这会降低随机性。

说一个好的和有效的结果集是下面的。

ID  DATA    REFERENCE
----------------------------------
8   P8      null 
2   P2      null 
1   P1      null 
4   P4      2
3   P3      null 
9   P9      8 
5   P5      null 
6   P6      4 
0   P0      null
7   P7      1

我更喜欢纯 SQL 解决方案，但提供 H2 很容易扩展我不会通过公开自己的 Java 方法来创建自定义函数。

更新这不是How to request a random row in SQL 的重复，因为：

除了随机性请求之外，我还有参考限制。事实上，我的问题的复杂程度来自这个参考限制，而不是随机。
我需要选择所有表记录，而不仅仅是一个

【问题讨论】：

为什么不将记录读入客户端，然后随机化呢？
可能，如果我不能仅使用 SQL 解决它，这将是我的方法。但是我不想用完纯 SQL 解决方案
How to request a random row in SQL?的可能重复
这不是重复的，因为除此之外我还有参考限制。
您的数据结构似乎是一个 n 叉树的森林。使用引用限制，您是说一旦选择了一个节点，您就不能选择它的父节点。这可能不是您可以直接在 SQL 中编码的东西，因为它需要跟踪所有先前选择的节点（行）。您必须将行加载到内存中的一组树中并自己导航结构。

标签： java sql h2

【解决方案1】：

在你真正深入挖掘之前，你永远不应该说永远。当我为 Jim 添加评论时，我实际上问自己 H2 是否提供了与 Oracle 等效的分层查询。当然，在高级部分H2 recursive queries 下的 H2 文档中解释了一些内容@

所以这里有一个几乎满足我要求的工作查询：

WITH link(id, data, reference, sort_val, level, tree_id) AS (
    -- Each tree root starts with a random sorting value up to half the number of records.
    -- This half the number of records is not really needed it can be a hard coded value
    -- I just said half to achieve a relative uniform distribution of three ids
    -- take the id of the starting row as a three id
    SELECT id, data, reference, round(rand()*(select count(*) FROM test)/2) AS sort_val, 0, id FROM test WHERE reference IS NULL

    UNION ALL

    -- Increase the sort value by level for each referencing row
    SELECT test.id, test.data, test.reference, link.sort_val + (level + 1) AS sort_val, level + 1, link.tree_id
       FROM link
       JOIN test ON link.id = test.reference
)
-- sort value, level and tree id are printed here just to make it easier to understand how it works
SELECT id, data, reference, sort_val, level, tree_id
  FROM link
 ORDER BY sort_val;

【讨论】：