【问题标题】:pivot multi-level nested fields in bigquery在 bigquery 中透视多级嵌套字段
【发布时间】:2020-02-29 05:33:50
【问题描述】:

我的 bq 表架构:

继续这篇文章:bigquery pivoting with nested field 我正在尝试弄平这张桌子。我想取消嵌套 timeseries.data 字段,即最终行数应该等于 timeseries.data 数组的总长度。我还想添加具有特定值的 annotation.properties.key 作为附加列,并添加 annotation.properties.value 作为其值。所以在这种情况下,它将是“边距”列。但是,以下查询给了我错误:“无法识别的名称:数据”。但是在最后一个 FROM 之后,我已经做了: unnest(timeseries.data) as data。

flow_timestamp, channel_name, number_of_digits, timestamp, value, margin
2019-10-31 15:31:15.079674 UTC, channel_1, 4, 2018-02-28T02:00:00, 50, 0.01

查询:

SELECT 
  flow_timestamp, timeseries.channel_name, 

  ( SELECT MAX(IF(channel_properties.key = 'number_of_digits', channel_properties.value, NULL)) 
    FROM UNNEST(timeseries.channel_properties) AS channel_properties
  ),
  data.timestamp ,data.value

,(with subq as (select * from unnest(data.annotation))
select max(if (properties.key = 'margin', properties.value, null))
from (
select * from unnest(subq.properties)
) as properties
) as margin

FROM my_table
left join unnest(timeseries.data) as data

WHERE DATE(flow_timestamp) between "2019-10-28" and "2019-11-02" 
order by flow_timestamp

【问题讨论】:

    标签: google-bigquery


    【解决方案1】:

    下面试试

    #standardSQL
    SELECT 
      flow_timestamp, 
      timeseries.channel_name, 
      ( SELECT MAX(IF(channel_properties.key = 'number_of_digits', channel_properties.value, NULL)) 
        FROM UNNEST(timeseries.channel_properties) AS channel_properties
      ) AS number_of_digits, 
      item.timestamp, 
      item.value, 
      ( SELECT MAX(IF(prop.key = 'margin', prop.value, NULL)) 
        FROM UNNEST(item.annotation) AS annot, UNNEST(annot.properties) prop
      ) AS margin  
    FROM my_table 
    LEFT JOIN UNNEST(timeseries.data) item
    WHERE DATE(flow_timestamp) BETWEEN '2019-10-28' AND '2019-11-02' 
    ORDER BY flow_timestamp
    

    下面是同一解决方案的详细版本,但我通常更喜欢上面,因为它更易于维护

    #standardSQL
    SELECT 
      flow_timestamp, 
      timeseries.channel_name, 
      ( SELECT MAX(IF(key = 'number_of_digits', value, NULL)) 
        FROM UNNEST(timeseries.channel_properties) AS channel_properties
      ) AS number_of_digits, 
      timestamp, 
      value, 
      ( SELECT MAX(IF(key = 'margin', value, NULL)) 
        FROM UNNEST(annotation), UNNEST(properties) 
      ) AS margin  
    FROM my_table 
    LEFT JOIN UNNEST(timeseries.data)   
    WHERE DATE(flow_timestamp) BETWEEN '2019-10-28' AND '2019-11-02' 
    ORDER BY flow_timestamp
    

    【讨论】:

    • 感谢您的快速回复。 FROM UNNEST(item.annotation) AS annot, UNNEST(annot.properties) prop 实际上会产生比“item”(timeseries.data)更多的行,对吧?因为它是一个重复的字段,在项目上。为什么最终结果与“item”长度相同?
    • 因为FROM UNNEST(item.annotation) AS annot, UNNEST(annot.properties) prop 仅用于仅计算边距的子选择中,与输出中的行数无关。所以输出中有多少行是由项目的大小定义的 - 所以我认为这正是你在问题中所问的:o)
    • @Yang - 这对你有用吗?我希望是这样 - 如果是这样,请考虑投票并接受:o)
    • "输出中有多少行由项的大小定义",由最后一个"FROM"之后的未嵌套项的数量决定,即 my_table LEFT JOIN UNNEST(timeseries.data) ,对吗?
    • 只是想更好地理解查询。谢谢
    猜你喜欢
    • 1970-01-01
    • 2012-12-19
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-02-19
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多