【问题标题】:Google BigQuery SQL: Order two columns independentlyGoogle BigQuery SQL:独立排序两列
【发布时间】:2016-04-30 08:30:02
【问题描述】:

假设我有一些数据,例如:

grp   v1   v2
---   --   --
 2    5    7
 2    4    9
 3    10   2
 3    11   1

我想创建独立于表顺序的新列 - 使两列具有独立的顺序,即按 v1 独立于 v2 排序,同时按 grp 分区。

结果(独立排序,按 grp 分区)将是:

grp   v1   v2  v1_ordered v2_ordered
---   --   --  ---------- ----------
 2    5    7       4          7
 2    4    9       5          9
 3    10   2      10          1
 3    11   1      11          2

一种方法是创建两个表并进行交叉连接。但是,我正在处理太多的数据行,以至于计算上难以处理 - 有没有办法在没有 JOIN 的单个查询中做到这一点?

基本上,我想编写如下 SQL:

SELECT
  *,
  v1 OVER (PARTITION BY grp ORDER BY v1 ASC) as v1_ordered,
  v2 OVER (PARTITION BY grp ORDER BY v2 ASC) as v2_ordered
FROM [example_table]

这打破了表格行的含义,但它是许多应用程序的必要功能 - 例如计算两个字段之间的有序相关性CORR(v1_ordered, v2_ordered).

这可能吗?

【问题讨论】:

    标签: sql sorting google-bigquery window-functions database-partitioning


    【解决方案1】:

    我认为你的方向是正确的!您只需要使用适当的窗口功能。 Row_number() 在这种情况下。它应该可以工作!

    根据@cgn 请求添加工作示例:
    我认为没有办法完全避免使用 JOIN。
    同时,下面的示例在其他答案中仅使用 ONE JOINTWO JOIN

    SELECT 
      a.grp AS grp, 
      a.v1 AS v1, 
      a.v2 AS v2, 
      a.v1 AS v1_ordered, 
      b.v2 AS v2_ordered 
    FROM (
      SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v1) AS v1_order
      FROM [example_table]
    ) AS a
    JOIN (
      SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v2) AS v2_order
      FROM [example_table]
    ) AS b
    ON a.grp = b.grp AND a.v1_order = b.v2_order 
    

    结果如预期:

    grp v1  v2  v1_ordered  v2_ordered   
    2    4   9           4           7   
    2    5   7           5           9   
    3   10   2          10           1   
    3   11   1          11           2   
    

    现在您可以使用如下所示的 CORR()

    SELECT grp, CORR(v1_ordered, v2_ordered) AS [corr]
    FROM (
      SELECT 
        a.grp AS grp, 
        a.v1 AS v1, 
        a.v2 AS v2, 
        a.v1 AS v1_ordered, 
        b.v2 AS v2_ordered 
      FROM (
        SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v1) AS v1_order
        FROM [example_table]
      ) AS a
      JOIN (
        SELECT grp, v1, v2, ROW_NUMBER() OVER(PARTITION BY grp ORDER BY v2) AS v2_order
        FROM [example_table]
      ) AS b
      ON a.grp = b.grp AND a.v1_order = b.v2_order
    )
    GROUP BY grp
    

    【讨论】:

    • 你确定吗? ROW_NUMBER() 将如何允许像 CORR(v1_ordered, v2_ordered) 这样的操作?
    • Row_number 允许您获得您在问题中提出的排序
    • 我不相信你是正确的。你能提供一个可行的例子吗?
    • 看起来其他人的答案证明使用 row_number() 是正确的方向!与此同时,我看到你又添加了一个条件 - without a JOIN - 这是一个游戏规则改变者 - 让我们看看这是否可行 :o)
    • 在我的回答中添加了“工作示例”。 (最后我到了我的电脑,所以能够给你更详细的答案)。希望现在你相信那个“方向”:o)
    【解决方案2】:

    这对你有用。

    SQLFiddle Demo in SQL Server

    注意:您在示例中提到的序列,对于如何从数据库返回行不是必需的。在我的情况下,对于v1,我得到了4,5,10,11,不像你的5,4,10,11。但是,您的输出将与您想要的相同。

    Select t.grp,t.v1,t.v2,
    v1.v1 as v1_ordered,v2.v2 as v2_ordered
    From
    (
        select t1.*,
        row_number() over (partition by grp
                       Order by v1) v1o
        ,
        row_number() over (partition by grp
                       Order by v2) v2o
        from table1 t1
    ) t
    Inner join
    (
        Select t.*,
        row_number() over (partition by grp
                       Order by v1) v1o
        From table1 t
    ) v1
    On t.grp=v1.grp
    And t.v1o=v1.v1o
    Inner join
    (
        Select t.*,
        row_number() over (partition by grp
                       Order by v2) v2o
        From table1 t
    ) v2
    On t.grp=v2.grp
    And t.v1o=v2.v2o
    

    输出:

    +------+-----+-----+-------------+------------+
    | grp  | v1  | v2  | v1_ordered  | v2_ordered |
    +------+-----+-----+-------------+------------+
    |   2  |  4  |  9  |          4  |          7 |
    |   2  |  5  |  7  |          5  |          9 |
    |   3  | 10  |  2  |         10  |          1 |
    |   3  | 11  |  1  |         11  |          2 |
    +------+-----+-----+-------------+------------+
    

    【讨论】:

      【解决方案3】:

      AI 不能 100% 确定这在 BigQuery 中是否有效,但情况如下:

      select e.*, ev1.v1, ev2.v2
      from (select e.*,
                   row_number() over (partition by grp order by v1) as seqnum_v1,
                   row_number() over (partition by grp order by v2) as seqnum_v2
            from example e
           ) e join
           (select e.*, row_number() over (partition by grp order by v1) as seqnum_v1
            from example e
           ) ev1
           on ev1.grp = e.grp and ev1.seqnum_v1 = e.seqnum_v1 join
           (select e.*, row_number() over (partition by grp order by v2) as seqnum_v2
            from example e
           ) ev2
           on ev2.grp = e.grp and ev2.seqnum_v2 = e.seqnum_v2;
      

      这个想法是为每一列分配一个独立的顺序。然后连接回原始表以获取实际值。

      【讨论】:

      • 大体思路是正确的,但括号对 BIGQUERY 来说不太合适 :)
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-01-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多