【问题标题】:BigQuery - Extract certain columns from a repeated and nested fieldBigQuery - 从重复和嵌套的字段中提取某些列
【发布时间】:2021-07-16 09:25:41
【问题描述】:

我的表结构如下:

|       Field name         |    Type    |    Mode    |
-----------------------------------------------------
message_info               |   RECORD   |  NULLABLE  |
  |-destination            |   RECORD   |  REPEATED  |
      |-address            |   STRING   |  NULLABLE  |
      |-service            |   STRING   |  NULLABLE  |
      |-selector           |   STRING   |  NULLABLE  |
      |-smime_signature    |   STRING   |  NULLABLE  |
      |-smime_decryption   |   STRING   |  NULLABLE  |
      |-smime_parsing      |   STRING   |  NULLABLE  |
      |-smime_extraction   |   STRING   |  NULLABLE  |  
 

我想保留 destination 字段中的 RECORDREPEATED 性质,但我只想检索前三个嵌套字段,因为我不需要 smime 字段。

我尝试了以下方法:

SELECT
    STRUCT(
        d.address AS address,
        d.service AS service,
        d.selector AS selector
    ) AS destination
FROM
    `myproject.mydataset.mytable` AS mail,
    UNNEST(mail.message_info.destination) AS d

但是,这不会保留 message_info.destination 字段的 REPEATED 特性。如果我像这样添加ARRAY_AGG() 语句:

SELECT
    ARRAY_AGG(STRUCT(
        d.address AS address,
        d.service AS service,
        d.selector AS selector
    )) AS destination
FROM
    `myproject.mydataset.mytable` AS mail,
    UNNEST(mail.message_info.destination) AS d

我收到一条错误消息,指出它与我正在检索的其他未重复字段冲突:SELECT list expression references mail.event_info.timestamp_usec which is neither grouped nor aggregated

检索这些字段的正确方法是什么?

【问题讨论】:

    标签: sql google-bigquery


    【解决方案1】:

    您可以使用子查询:

    select mail.*,
           (select array_agg(struct(d.address AS address,
                                    d.service AS service,
                                    d.selector AS selector
                                   )
                            )
            from unnest(mail.message_info.destination) d
           ) as new_destination
    from `myproject.mydataset.mytable` mail;
    

    这将为原始表中的每一行创建new_destination。当您在没有聚合的外部查询中使用 array_agg() 时,您将聚合所有行——因此您无法选择其他列。

    【讨论】:

      【解决方案2】:

      考虑以下方法。
      虽然其他答案中的方法会破坏原始表的架构 - 这个完全保留它

      select * replace(
          (select as struct * 
             replace(array(select as struct address, service, selector from mi.destination) as destination)
           from unnest([message_info]) mi
          ) as message_info
        ) 
      from `myproject.mydataset.mytable` t
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2018-06-22
        • 1970-01-01
        • 2016-07-17
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2023-01-20
        • 1970-01-01
        相关资源
        最近更新 更多