【问题标题】:Athena/Presto : Unnest 2 arrays with left joinAthena/Presto:使用左连接取消嵌套 2 个数组
【发布时间】:2021-06-01 22:36:34
【问题描述】:

所以我有 2 个 Json 数组需要取消嵌套,并根据 json 结构中的键加入。 理论上很容易,但如果没有“左连接嵌套”功能,一切都会变得一团糟。

通过对结果进行分组,我已经实现了我想要的;但我也担心它会进行 2 次交叉连接,从而有效地生成数千个多余的行(在实时环境中),然后再将它们过滤掉。

因此,我的问题实际上是在寻找一种更有效的策略来执行相同的逻辑。我很清楚我的 Presto 经验和知识还处于起步阶段!

感谢您的指导!

工作原理:

基本逻辑: 'left' 数组中的每个项目都有一个 $.id 值。 对于 一些 'left' 项目,将有一个匹配的具有 $.a.id 值的右项目

示例:

  1. 下面的第一个 SQL 和结果显示了设置,如果不是所需的结果。
  2. 第二组显示我当前的解决方案。

(1) 交叉连接的原始结果

with cte as (
    Select 
        123 as record_id,
        '[ {"id":"01","key1":["val1"]}, {"id":"02","key1":["val2"]}, {"id":"03","key1":["val3"]} ]' as "left",
        '[ {"a":{"id":"02","key1":["apples"]}, "b":{"lala":"bananas"}},{"a":{"id":"01","key1":["one"]}, "b":{"lala":"oneone"}} ]' as "right"
)
select 
    record_id,
    l.i as "left",
    r.i as "right",
    json_extract(l.i, '$.id') as left_id,
    json_extract(r.i, '$.a.id') as right_id
from 
    cte,
    unnest(cast (json_parse("left") as array(json))) as l(i),    -- left array
    unnest(cast (json_parse("right") as array(json))) as r(i)    -- right array

输出:

record_id left right left_id right_id
123 {"id":"01","key1":["val1"]} {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} "01" "02"
123 {"id":"01","key1":["val1"]} {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} "01" "01"
123 {"id":"02","key1":["val2"]} {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} "02" "02"
123 {"id":"02","key1":["val2"]} {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} "02" "01"
123 {"id":"03","key1":["val3"]} {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} "03" "02"
123 {"id":"03","key1":["val3"]} {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} "03" "01"

(2) 目前的解决方案

select 
  record_id,
  l.i as "left",
  max( if(json_extract(l.i, '$.id') = json_extract(r.i, '$.a.id'),json_format(r.i),null) )as match
from 
  cte,
   unnest(cast (json_parse("left") as array(json))) as l(i),    -- left array
   unnest(cast (json_parse("right") as array(json))) as r(i)    -- right array
group by 
  record_id,
  l.i 
record_id left match
123 {"id":"01","key1":["val1"]} {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}}
123 {"id":"02","key1":["val2"]} {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}}
123 {"id":"03","key1":["val3"]}

【问题讨论】:

    标签: sql left-join presto amazon-athena unnest


    【解决方案1】:

    在 CTE 和左连接 CTE 中取消嵌套两个数组,在这种情况下,您将消除交叉连接,但代码有点长:

    with cte as (
        Select 
            123 as record_id,
            '[ {"id":"01","key1":["val1"]}, {"id":"02","key1":["val2"]}, {"id":"03","key1":["val3"]} ]' as "left",
            '[ {"a":{"id":"02","key1":["apples"]}, "b":{"lala":"bananas"}},{"a":{"id":"01","key1":["one"]}, "b":{"lala":"oneone"}} ]' as "right"
    ), 
    
    "left" as (
    select 
        record_id,
        l.i as "left",
        json_extract(l.i, '$.id') as left_id
    from 
        cte,
        unnest(cast (json_parse("left") as array(json))) as l(i)    -- left array
    ), 
    
    "right" as (
      select 
        record_id,
        r.i as "right",
        json_extract(r.i, '$.a.id') as right_id
    from 
        cte,
        unnest(cast (json_parse("right") as array(json))) as r(i)    -- right array
    )
    
    select 
        l.record_id,
        l."left",
        r."right",
        l.left_id,
        r.right_id
    from 
        "left" l left join "right" r on l.record_id=r.record_id and l.left_id=r.right_id
    

    结果:

    record_id left right left_id right_id
    123 {"id":"01","key1":["val1"]} {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} "01" "01"
    123 {"id":"02","key1":["val2"]} {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} "02" "02"
    123 {"id":"03","key1":["val3"]} \N "03" \N

    【讨论】:

      猜你喜欢
      • 2021-02-24
      • 2020-11-11
      • 2021-08-29
      • 1970-01-01
      • 1970-01-01
      • 2021-06-21
      • 2021-12-15
      • 2012-01-21
      • 2021-02-01
      相关资源
      最近更新 更多