【问题标题】:In redshift SQL, How to create a new column for each unique value in one column?在 redshift SQL 中,如何为一列中的每个唯一值创建一个新列?
【发布时间】:2021-07-03 09:51:22
【问题描述】:

我有一个看起来像这样的表:

Key Value
1 A
1 B
2 A
2 B
2 C
2 D

我想把它转换成:

Key Value1 Value2 Value3 Value4
1 A B - -
2 A B C D

【问题讨论】:

  • 是否只有 4 个可能的值,即 A-D ?请分享您的尝试。
  • 不,有很多值,可能以千为单位

标签: sql amazon-web-services amazon-redshift transform


【解决方案1】:

如果你知道你有多少个 Values,你可以在 groupby 中做:

SELECT Key,
MAX(CASE WHEN Value = 'A' THEN Value ELSE '-' END) AS Value1,
MAX(CASE WHEN Value = 'B' THEN Value ELSE '-' END) AS Value2,
MAX(CASE WHEN Value = 'C' THEN Value ELSE '-' END) AS Value3,
MAX(CASE WHEN Value = 'D' THEN Value ELSE '-' END) AS Value4
FROM
TABLE_NAME
GROUP BY Key

如果值的数量未知,则无法使用通用代码。这是由于列不是无限的。 https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_usage.html

【讨论】:

    【解决方案2】:

    解决此问题的另一种方法是按某种顺序(按字母顺序)分配列。这是扩展数据后的样子(key = 3 添加并扩展为 20 个可能的列)。

    CREATE TABLE t1 
    AS
    SELECT 1 AS KEY,'A' AS Value UNION SELECT 1, 'B' 
        UNION SELECT 2, 'A' UNION SELECT 2, 'B' UNION SELECT 2, 'C' UNION SELECT 2, 'D' 
        UNION SELECT 3, 'X' UNION SELECT 3, 'Y' UNION SELECT 3, 'Z' 
    ;
    
    select key,
        max(decode(rn, 1, value, '-')) as v1, 
        max(decode(rn, 2, value, '-')) as v2,
        max(decode(rn, 3, value, '-')) as v3,
        max(decode(rn, 4, value, '-')) as v4,
        max(decode(rn, 5, value, '-')) as v5,
        max(decode(rn, 6, value, '-')) as v6,
        max(decode(rn, 7, value, '-')) as v7,
        max(decode(rn, 8, value, '-')) as v8, 
        max(decode(rn, 9, value, '-')) as v9,
        max(decode(rn, 10, value, '-')) as v10,
        max(decode(rn, 11, value, '-')) as v11,
        max(decode(rn, 12, value, '-')) as v12,
        max(decode(rn, 13, value, '-')) as v13,
        max(decode(rn, 14, value, '-')) as v14,
        max(decode(rn, 15, value, '-')) as v15, 
        max(decode(rn, 16, value, '-')) as v16,
        max(decode(rn, 17, value, '-')) as v17,
        max(decode(rn, 18, value, '-')) as v18,
        max(decode(rn, 19, value, '-')) as v19,
        max(decode(rn, 20, value, '-')) as v20
    from (
        select key, value, row_number() over (partition by key order by value) as rn
        from t1 )
    group by key
    order by key;
    

    请注意,由于可读性更好,我使用了 DECODE() 语句而不是 CASE。您提到了 Redshift,所以这将在那里工作,但您可能需要恢复到 CASE 以用于其他数据库。

    这种方法的结果如下所示:

    key v1  v2  v3  v4  v5  v6  v7  v8  v9  v10 v11 v12 v13 v14 v15 v16 v17 v18 v19 v20
    1   A   B   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -
    2   A   B   C   D   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -
    3   X   Y   Z   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -   -
    

    【讨论】:

    • 有数千个值,因此无法手动编码。我发现最好的选择是使用 listagg - 然后找到一种方法将列表动态地分成新列。这可能吗?非常感谢您提供此解决方案。
    • Redshift 的每表列限制为 1,600。 Listagg() 受限于 64K 字符的最大 varchar 长度。可能是时候列出您的解决方案试图实现的目标了。制作非常宽的表格和非常长的字符串通常不会导致高效的进程。
    猜你喜欢
    • 1970-01-01
    • 2021-09-18
    • 1970-01-01
    • 1970-01-01
    • 2020-08-01
    • 2021-09-23
    • 2022-12-12
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多