【发布时间】:2021-06-01 22:36:34
【问题描述】:
所以我有 2 个 Json 数组需要取消嵌套,并根据 json 结构中的键加入。 理论上很容易,但如果没有“左连接嵌套”功能,一切都会变得一团糟。
通过对结果进行分组,我已经实现了我想要的;但我也担心它会进行 2 次交叉连接,从而有效地生成数千个多余的行(在实时环境中),然后再将它们过滤掉。
因此,我的问题实际上是在寻找一种更有效的策略来执行相同的逻辑。我很清楚我的 Presto 经验和知识还处于起步阶段!
感谢您的指导!
工作原理:
基本逻辑: 'left' 数组中的每个项目都有一个 $.id 值。 对于 一些 'left' 项目,将有一个匹配的具有 $.a.id 值的右项目
示例:
- 下面的第一个 SQL 和结果显示了设置,如果不是所需的结果。
- 第二组显示我当前的解决方案。
(1) 交叉连接的原始结果
with cte as (
Select
123 as record_id,
'[ {"id":"01","key1":["val1"]}, {"id":"02","key1":["val2"]}, {"id":"03","key1":["val3"]} ]' as "left",
'[ {"a":{"id":"02","key1":["apples"]}, "b":{"lala":"bananas"}},{"a":{"id":"01","key1":["one"]}, "b":{"lala":"oneone"}} ]' as "right"
)
select
record_id,
l.i as "left",
r.i as "right",
json_extract(l.i, '$.id') as left_id,
json_extract(r.i, '$.a.id') as right_id
from
cte,
unnest(cast (json_parse("left") as array(json))) as l(i), -- left array
unnest(cast (json_parse("right") as array(json))) as r(i) -- right array
输出:
| record_id | left | right | left_id | right_id |
|---|---|---|---|---|
| 123 | {"id":"01","key1":["val1"]} | {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} | "01" | "02" |
| 123 | {"id":"01","key1":["val1"]} | {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} | "01" | "01" |
| 123 | {"id":"02","key1":["val2"]} | {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} | "02" | "02" |
| 123 | {"id":"02","key1":["val2"]} | {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} | "02" | "01" |
| 123 | {"id":"03","key1":["val3"]} | {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} | "03" | "02" |
| 123 | {"id":"03","key1":["val3"]} | {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} | "03" | "01" |
(2) 目前的解决方案
select
record_id,
l.i as "left",
max( if(json_extract(l.i, '$.id') = json_extract(r.i, '$.a.id'),json_format(r.i),null) )as match
from
cte,
unnest(cast (json_parse("left") as array(json))) as l(i), -- left array
unnest(cast (json_parse("right") as array(json))) as r(i) -- right array
group by
record_id,
l.i
| record_id | left | match |
|---|---|---|
| 123 | {"id":"01","key1":["val1"]} | {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} |
| 123 | {"id":"02","key1":["val2"]} | {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} |
| 123 | {"id":"03","key1":["val3"]} |
【问题讨论】:
标签: sql left-join presto amazon-athena unnest