Redshift 获取 Json 密钥答案

【问题标题】：Redshift get Json KeysRedshift 获取 Json 密钥
【发布时间】：2020-10-19 22:13:47
【问题描述】：

我有一个表字段，其值如下：

{
    "0fc8a2a1-e334-43b8-9311-ce46da9cd32c": {
        "alert": "345",
        "channel": "ios_push",
        "name": "Variant 1"
    },
    "4344d89b-7f0d-4453-b2c5-d0d4a39d7d25": {
        "channel": "ios_push",
        "name": "Control Group",
        "type": "control"
    }
}

我想知道是否有办法获得 "0fc8a2a1-e334-43b8-9311-ce46da9cd32c" 和 “4344d89b-7f0d-4453-b2c5-d0d4a39d7d25” 值。

【问题讨论】：

Redshift 对 JSON 的支持非常有限。我怀疑你要求的东西是不可能的。
您希望返回什么？你有一行的 json，你想要两个答案。你想要一个逗号分隔的列表吗？数组？（Redshift 不支持数组类型）你对这个结果做了什么/这是传递给另一个（非 SQL）工具吗？
@BillWeiner 如果我可以将它们放在逗号分隔的列表中，那就可以了。但到最后，我需要将它们中的每一个放在单独的行中
如果 json 如此简单，那么 regexp_replace 函数会去除前导/尾随 {}，然后折叠 ': {' 和 '}' 之间的所有内容，将为您提供所需的逗号分隔字符串。然后你可以使用一些交叉连接将它变成行（参见stackoverflow中的“listunagg”线程）。

标签： sql json amazon-redshift

【解决方案1】：

Redshift 不适用于 JSON，尤其不适用于任意 JSON 键（如 @GMB 所述）。嵌套数据结构也不好。

所以实际上，你有两个问题：

提取 json 密钥。我在这里看到 2 个选项：

使用 python UDF
使用正则表达式

将一组键取消嵌套到表中。将数据嵌套到行中有一个技巧（请参阅下面查询中的CROSS JOIN 和seq 表）- 在this SO answer 中进行了描述。

1。 python UDF的解决方案

可以在python中实现json解析，注册为用户定义函数https://docs.aws.amazon.com/redshift/latest/dg/udf-python-language-support.html

功能：

create or replace function f_py_json_keys (a varchar(65535))
    returns varchar(65535)
    stable
as $$
    import json
    return ",".join(json.loads(a).keys())
$$ language plpythonu;

查询：

with input(json) as (
    select '{
    "0fc8a2a1-e334-43b8-9311-ce46da9cd32c": {
        "alert": "345",
        "channel": "ios_push",
        "name": "Variant 1"
    },
    "4344d89b-7f0d-4453-b2c5-d0d4a39d7d25": {
        "channel": "ios_push",
        "name": "Control Group",
        "type": "control"
    }
}'::varchar
), seq(idx) as (
    select 1 UNION ALL
    select 2 UNION ALL
    select 3 UNION ALL
    select 4 UNION ALL
    select 5
), input_with_occurences as (
    select f_py_json_keys(json) as keys,
           regexp_count(keys, ',') + 1 as number_of_occurrences
    from input
)
select
    split_part(keys, ',', idx) as id
from input_with_occurences cross join seq
where idx <= number_of_occurrences

2。使用 REGEX 魔法的解决方案

Redshift 有一些正则表达式函数。这是一个可以为您指定的有效负载完成工作的工作示例：

with input(json) as (
    select '{
    "0fc8a2a1-e334-43b8-9311-ce46da9cd32c": {
        "alert": "345",
        "channel": "ios_push",
        "name": "Variant 1"
    },
    "4344d89b-7f0d-4453-b2c5-d0d4a39d7d25": {
        "channel": "ios_push",
        "name": "Control Group",
        "type": "control"
    }
}'::varchar
), seq(idx) as (
    select 1 UNION ALL
    select 2 UNION ALL
    select 3 UNION ALL
    select 4 UNION ALL
    select 5
), input_with_occurences as (
    select *,
           regexp_count(json,
                        '\\{?\\"([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})\\":\\s\\{[\\w\\s\\":,]+\\}') as number_of_occurrences
    from input
)
select
       REGEXP_SUBSTR(json, '\\{?\\"([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})\\":\\s\\{[\\w\\s\\":,]+\\}', 1, idx, 'e') as id
       from input_with_occurences cross join seq
        where idx <= number_of_occurrences

结果如下：

+------------------------------------+
|id                                  |
+------------------------------------+
|0fc8a2a1-e334-43b8-9311-ce46da9cd32c|
|4344d89b-7f0d-4453-b2c5-d0d4a39d7d25|
+------------------------------------+

【讨论】：

在编写代码时运行“REGEX”版本可以完美运行。但是在使用我的真实数据库时它不起作用......我做了一些测试，看起来它不起作用，因为在我的数据中，JSON 字段不是“美化”，它只是一行. 我猜这个 Rexex 需要一个真正的 JSON 格式......任何解决方法？ PS：提前谢谢，你的回答是惊人的
结束了，经过几次尝试，我可以使用这个简化的正则表达式使其工作：'[0-9a-f]{8}-[0-9a-f]{4}-[ 0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}'

【解决方案2】：

虽然这个问题已经过去一年了，但我找到了如何使用 Redshift 原生功能来使用它。

Redshift 有一个存储原生 JSON 的 SUPER 列类型： https://docs.aws.amazon.com/redshift/latest/dg/r_SUPER_type.html

并且可以使用 PartiQL 进行查询：

https://docs.aws.amazon.com/redshift/latest/dg/query-super.html

https://partiql.org/tutorial.html

假设一个名为MyTable 的表具有一个名为data 的超级类型的列可以存储JSON，我创建了以下查询：

SELECT
  key
FROM
  MyTable AS t,
  UNPIVOT t.data AS value AT key;

对于 JSON 数组，语法 x AS y AT z 表示 "foreach Y in X"，其中 z 是数组 x 内对象 y 的索引
在反透视 JSON 对象的情况下，x 和 y 的语法含义相同，但 z 是键，而不是索引（当您考虑它时，数组也可以表示为对象方式，其中索引为key：`{0: 'a', 1: 'b', ... }

查询结果为：

create temp table if not exists MyTable(
  data SUPER
);
insert into MyTable VALUES (json_parse('{"a": 1, "b":2}'));

select key from MyTable as t, unpivot t.data as value at key;

【讨论】：