【问题标题】:Athena/Presto find key with the max value in JSON objectAthena/Presto 在 JSON 对象中查找具有最大值的键
【发布时间】:2021-09-05 22:51:16
【问题描述】:
我在 Athena (type string) 中有一个带有 json 的列,如下所示:
{
"key1": 1.1,
"key2":2.2,
"key3": 3.3
}
如何编写一个查询,该查询将为我返回每行具有最高值的 JSON 键(在本例中为 key3)和相关值 (3.3)。
注意:我事先不知道键名是什么(可能有好几个)
【问题讨论】:
标签:
json
amazon-athena
presto
【解决方案1】:
您可以将您的 json 转换为 MAP(VARCHAR, INTEGER) 并对其进行处理。例如(这使用map_entries 函数将映射转换为行数组,reduce 数组函数并依赖于默认行命名约定):
WITH dataset AS (
SELECT *
FROM (VALUES
(JSON '{
"key1": 1.1,
"key2":2.2,
"key3": 3.3
}'),
(JSON '{
"key0": 1.1,
"key1":4.4,
"key2": 3.3
}')) AS t (json))
SELECT row.field0 as key, row.field1 as value
FROM
(SELECT reduce(
map_entries(CAST(json as MAP(VARCHAR, INTEGER))),
ROW (null, null),
(agg, curr) -> IF (agg.field1 > curr.field1, agg, curr),
s -> s) as row
FROM dataset)
输出:
【解决方案2】:
所以我找到了一种方法,但它似乎很复杂,如果其他人有更好的解决方案,将不胜感激。假设有一个名为Id的列,而json存储在一个单独的列中:
with d as (
select id,
CAST(json_extract(json_col, '$') AS MAP(VARCHAR, VARCHAR)) as s
from TABLE_NAME
),
d2 as (
select *,
element_at(s, key) AS value
from d
cross join unnest(map_keys(s)) AS sx(key)
),
d3 as (
select id, key, value,
rank() over (partition by id order by value desc) as order
from d2
order by id, order
)
select id, key, value from d3 where order = 1
基本上首先将 JSON 对象转换为映射,然后取消映射键和交叉连接并在单独的列中存储值,然后计算按值划分的排名,然后只选择排名 = 1 的那些行