【问题标题】:Return only one result for each partition of data每个数据分区只返回一个结果
【发布时间】:2014-08-06 19:29:17
【问题描述】:

我希望能够在 BigQuery 中按分区进行一些计算,然后为每个分区只输出 1 行(而不是为每个分区输出一行)。例如,如果我有这样的表:

Category | Location | Count
A        | 'home'   | 20
A        | 'work'   | 10
A        | 'lab'    | 6
B        | 'home'   | 5
C        | 'lab'    | 15
C        | 'home'   | 25

我希望得到这个结果

Category  | TopLocation     | TopCount | SecondLocation | SecondCount
A         | 'home'          | 20       | 'work'         | 10
B         | 'home'          | 5        | NULL           | NULL
C         | 'home'          | 25       | 'lab'          | 15

我认为我可以使用分区来执行此操作,但这最终会为每个值生成一行,而不是我想要的单行,因此我然后按类别分组并使用 FIRST。有没有更好的方法来避免生成如此多的中间行(并希望避免窗口函数的“大结果”问题)。

SELECT
  category,
  FIRST(TopLocation) TopLocation,
  FIRST(TopCount) TopCount,
  FIRST(SecondLocation) SecondLocation,
  FIRST(SecondCount) SecondCount,
FROM
  (SELECT
      category,
      NTH_VALUE(Location, 1) OVER (PARTITION BY category ORDER BY count) TopLocation,
      NTH_VALUE(Count, 1) OVER (PARTITION BY category ORDER BY count) TopCount,
      NTH_VALUE(Location, 2) OVER (PARTITION BY category ORDER BY count) SecondLocation,
      NTH_VALUE(Count, 1) OVER (PARTITION BY category ORDER BY count) SecondCount
   FROM
      mytable
   )    
GROUP BY
  category
ORDER BY
  category DESC

【问题讨论】:

    标签: sql google-bigquery


    【解决方案1】:

    这应该做的工作:

    select category, 
        first(if(rank = 1, location, null)) as location_1, first(if(rank = 1, count, null)) as count_1,
        first(if(rank = 2, location, null)) as location_2, first(if(rank = 2, count, null)) as count_2,
        first(if(rank = 3, location, null)) as location_3, first(if(rank = 3, count, null)) as count_3
    from
        (select row_number() over (partition by category order by count desc) as rank, * 
    from 
        (select 'A' as category, 'home' AS location, 20 as count),
        (select 'A' as category, 'work' AS location, 10 as count),
        (select 'A' as category, 'lab' AS location, 6 as count),
        (select 'B' as category, 'home' AS location, 5 as count),
        (select 'C' as category, 'lab' AS location, 15 as count),
        (select 'C' as category, 'home' AS location, 25 as count)
    )
    group by category order by category
    

    结果:

    Row category    location_1  count_1 location_2  count_2 location_3  count_3  
    1   A   home    20  work    10  lab 6    
    3   B   home    5   null    null    null    null
    2   C   home    25  lab 15  null    null     
    

    但可能无法解决窗口函数上“大查询结果”的问题

    【讨论】:

      【解决方案2】:

      更新:#standardSQL 的更好解决方案


      怎么样:

      SELECT word, word_count, corpus, rank FROM (
        SELECT word, word_count, corpus,
               RANK() OVER (PARTITION BY corpus ORDER BY word_count DESC) rank
        FROM [publicdata:samples.shakespeare] 
        WHERE word_count > 6
      )
      WHERE rank<3
      

      【讨论】:

      • 虽然每个语料库仍然有 3 行。
      • 您是对的 - 我正在尝试寻找替代方案,但您的查询看起来非常适合预期目的。请注意在我的 queyr 上使用 word_count>6 来过滤掉长尾(对于“大结果”问题很有用)。
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-04-17
      • 2016-05-23
      • 2017-01-03
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多