【问题标题】:In BigQuery, convert stringified array of objects into non-stringified在 BigQuery 中,将对象的字符串化数组转换为非字符串化
【发布时间】:2020-07-27 02:14:11
【问题描述】:

我正在将 .json 数据提取到 Google BigQuery 中,并且在提取时,来自 .jsonarraysobjects 的数据类型都被转换为 string 列。 BigQuery 中的数据如下所示:

select 1 as id, '[]' as stringCol1, '[]' as stringCol2 union all
select 2 as id, null as stringCol1, null as stringCol2 union all
select 3 as id, "{'game': '22', 'year': 'sophomore'}" as stringCol1, "[{'teamName': 'teamA', 'teamAge': 37}, {'teamName': 'teamB', 'teamAge': 32]" as stringCol2 union all
select 4 as id, "{'game': '17', 'year': 'freshman'}" as stringCol1, "[{'teamName': 'teamA', 'teamAge': 32}, {'teamName': 'teamB', 'teamAge': 33]" as stringCol2 union all
select 5 as id, "{'game': '9', 'year': 'senior'}" as stringCol1, "[{'teamName': 'teamC', 'teamAge': 31}, {'teamName': 'teamD', 'teamAge': 17]" as stringCol2 union all
select 6 as id, "{'game': '234', 'year': 'junior'}" as stringCol1, "[{'teamName': 'teamC', 'teamAge': 42}, {'teamName': 'teamD', 'teamAge': 25]" as stringCol2

数据有点乱。

  • stringCol1 中,有null'[]' 缺失数据的值。我想从这个字符串化对象创建两列 gameyear
  • 对于stringCol2,这始终是一个包含两个对象的数组,具有相同的键(teamNameteamAge,在这种情况下)。然后需要将其转换为 4 列 teamName1teamAge1teamName2teamAge2

This similar post 解决了将基本字符串化数组转换为非字符串化数组的问题,但这里的示例稍微复杂一些。特别是,其他帖子中的解决方案在这种情况下不起作用。

【问题讨论】:

    标签: google-bigquery


    【解决方案1】:

    以下是 BigQuery 标准 SQL

    #standardSQL
    SELECT id,
      JSON_EXTRACT_SCALAR(stringCol1, '$.game') AS game,
      JSON_EXTRACT_SCALAR(stringCol1, '$.year') AS year,
      JSON_EXTRACT_SCALAR(t1, '$.teamName') AS teamName1,
      JSON_EXTRACT_SCALAR(t1, '$.teamAge') AS teamAge1,
      JSON_EXTRACT_SCALAR(t2, '$.teamName') AS teamName2,
      JSON_EXTRACT_SCALAR(t2, '$.teamAge') AS teamAge2
    FROM `project.dataset.table`,
    UNNEST([STRUCT(
      JSON_EXTRACT_ARRAY(stringCol2)[SAFE_OFFSET(0)] AS t1, 
      JSON_EXTRACT_ARRAY(stringCol2)[SAFE_OFFSET(1)] AS t2
    )])   
    

    如果适用于您问题中的示例数据

    WITH `project.dataset.table` AS (
      SELECT 1 AS id, '[]' AS stringCol1, '[]' AS stringCol2 UNION ALL
      SELECT 2 AS id, NULL AS stringCol1, NULL AS stringCol2 UNION ALL
      SELECT 3 AS id, "{'game': '22', 'year': 'sophomore'}" AS stringCol1, "[{'teamName': 'teamA', 'teamAge': 37}, {'teamName': 'teamB', 'teamAge': 32}]" AS stringCol2 UNION ALL
      SELECT 4 AS id, "{'game': '17', 'year': 'freshman'}" AS stringCol1, "[{'teamName': 'teamA', 'teamAge': 32}, {'teamName': 'teamB', 'teamAge': 33}]" AS stringCol2 UNION ALL
      SELECT 5 AS id, "{'game': '9', 'year': 'senior'}" AS stringCol1, "[{'teamName': 'teamC', 'teamAge': 31}, {'teamName': 'teamD', 'teamAge': 17}]" AS stringCol2 UNION ALL
      SELECT 6 AS id, "{'game': '234', 'year': 'junior'}" AS stringCol1, "[{'teamName': 'teamC', 'teamAge': 42}, {'teamName': 'teamD', 'teamAge': 25}]" AS stringCol2
    ) 
    

    输出是

    Row id  game    year        teamName1   teamAge1    teamName2   teamAge2     
    1   1   null    null        null        null        null        null     
    2   2   null    null        null        null        null        null     
    3   3   22      sophomore   teamA       37          teamB       32   
    4   4   17      freshman    teamA       32          teamB       33   
    5   5   9       senior      teamC       31          teamD       17   
    6   6   234     junior      teamC       42          teamD       25      
    

    上面可以有很多变体来提高例如可读性

    #standardSQL
    SELECT id,
      JSON_EXTRACT_SCALAR(stringCol1, '$.game') AS game,
      JSON_EXTRACT_SCALAR(stringCol1, '$.year') AS year,
      JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(0)], '$.teamName') AS teamName1,
      JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(0)], '$.teamAge') AS teamAge1,
      JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(1)], '$.teamName') AS teamName2,
      JSON_EXTRACT_SCALAR(t[SAFE_OFFSET(1)], '$.teamAge') AS teamAge2
    FROM `project.dataset.table`,
    UNNEST([STRUCT(JSON_EXTRACT_ARRAY(stringCol2) AS t)])
    

    【讨论】:

    • 非常有帮助,谢谢。 json_extract_* 似乎是 BigQuery 中的一个强大功能
    猜你喜欢
    • 2018-02-22
    • 2017-08-24
    • 1970-01-01
    • 2021-06-02
    • 2020-11-20
    • 1970-01-01
    • 1970-01-01
    • 2016-04-21
    • 2021-07-23
    相关资源
    最近更新 更多