【问题标题】:How to simulate a pivot table with BigQuery?如何使用 BigQuery 模拟数据透视表?
【发布时间】:2013-10-25 05:28:34
【问题描述】:

我需要按列组织查询结果,就好像它是一个数据透视表一样。我该怎么做?

【问题讨论】:

    标签: google-bigquery


    【解决方案1】:

    2020 年更新fhoffa.x.pivot()


    使用条件语句将查询结果组织成行和列。在下面的示例中,搜索以“Google”开头的大多数修订过的 Wikipedia 文章的结果被组织到列中,如果它们满足各种条件,则会在这些列中显示修订计数。

    SELECT
      page_title,
      /* Populate these columns as True or False, depending on the condition */
      IF(page_title CONTAINS 'search', INTEGER(total), 0) AS search,
      IF(page_title CONTAINS 'Earth' OR page_title CONTAINS 'Maps', INTEGER(total), 0) AS geo,
    FROM
      /* Subselect to return top revised Wikipedia articles containing 'Google'
       * followed by additional text.
       */
      (SELECT
        TOP(title, 5) as page_title,
        COUNT(*) as total
       FROM
         [publicdata:samples.wikipedia]
       WHERE
         REGEXP_MATCH (title, r'^Google.+') AND wp_namespace = 0
      );
    

    结果:

    +---------------+--------+------+
    |  page_title   | search | geo  |
    +---------------+--------+------+
    | Google search |   4261 |    0 |
    | Google Earth  |      0 | 3874 |
    | Google Chrome |      0 |    0 |
    | Google Maps   |      0 | 2617 |
    | Google bomb   |      0 |    0 |
    +---------------+--------+------+
    

    一个类似的例子,不使用子查询:

    SELECT SensorType, DATE(DTimestamp), AVG(data) avg, 
    FROM [data-sensing-lab:io_sensor_data.moscone_io13]
    WHERE DATE(DTimestamp) IN ('2013-05-16', '2013-05-17')
    GROUP BY 1, 2
    ORDER BY 2, 3 DESC;
    

    生成一个 3 列的表:传感器类型、日期和平均数据。要“旋转”并将日期作为列:

    SELECT
      SensorType,
      AVG(IF(DATE(DTimestamp) = '2013-05-16', data, null)) d16,
      AVG(IF(DATE(DTimestamp) = '2013-05-17', data, null)) d17
    FROM [data-sensing-lab:io_sensor_data.moscone_io13]
    GROUP BY 1
    ORDER BY 2 DESC;
    

    【讨论】:

    • 起初使用AVG(IF(...))MAX(IF(...))(用于字符串)来“聚合单个值”看起来很奇怪......但它有效,谢谢!!
    • @Ripounet 字符串 FIRST(IF(..)) 可能看起来更好,或者 GROUP_CONCAT(IF(..)) 保留所有字符串。
    • 太好了,FIRST 符合我对数字和字符串的需求。
    【解决方案2】:

    相同的方法/结果,但使用 BigQuery 标准 SQL:

    -- top revised Wikipedia articles containing 'Google'
    WITH articles AS (
      SELECT title AS page_title,
             COUNT(*) AS total
        FROM `publicdata.samples.wikipedia`
       WHERE REGEXP_CONTAINS(title, r'^Google.+') AND wp_namespace = 0
       GROUP BY title
       ORDER BY total DESC
       LIMIT 5
    )
    
    SELECT page_title,
           -- Populate these columns as True or False, depending on the condition
           IF(page_title LIKE '%search%', total, 0) AS search,
           IF(page_title LIKE '%Earth%' OR page_title LIKE '%Maps%', total, 0) AS geo
      FROM articles
    ;
    

    【讨论】:

      猜你喜欢
      • 2019-04-27
      • 2021-06-15
      • 1970-01-01
      • 1970-01-01
      • 2022-01-07
      • 2014-12-04
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多