【问题标题】:Convert an array into a Map将数组转换为 Map
【发布时间】:2019-04-11 03:44:42
【问题描述】:

我有一个表格,里面有一列像

[{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}]

格式为array<struct<key:string,value:array<string>>>

我想将该列转换为以下格式:

{"e":["253","203","204"],"st":["mi"],"k2":["1","2"]}

类型为map<string,array<string>>

我尝试过炸开数组,但这不起作用。任何想法我如何在蜂巢中做到这一点。

【问题讨论】:

  • 我认为您必须为此编写自定义 UDF。你可以使用火花
  • 您的输出 json 无效。你不觉得 Map> 应该用 {"e":["253","203","204"],"st":["mi"],"k2 ":["1","2"]} .

标签: sql hadoop hive hiveql


【解决方案1】:

不使用外部库是不可能的。请参考brickhouse 或创建您自己的UDAF。


注意:进一步的代码提供sn-ps来重现问题并解决Hive的内置函数可以解决的问题。即map<string,string> 不是map<string, array<string>>

-- reproducing the problem
CREATE TABLE test_table(id INT, input ARRAY<STRUCT<key:STRING,value:ARRAY<STRING>>>);
INSERT INTO TABLE test_table 
SELECT 
    1 AS id,
    ARRAY(
        named_struct("key","e",  "value", ARRAY("253","203","204")),
        named_struct("key","st", "value", ARRAY("mi")),
        named_struct("key","k2", "value", ARRAY("1", "2"))
    ) AS input;

SELECT id, input FROM test_table;
+-----+-------------------------------------------------------------------------------------------------------+--+
| id  |                                                 input                                                 |
+-----+-------------------------------------------------------------------------------------------------------+--+
| 1   | [{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}]  |
+-----+-------------------------------------------------------------------------------------------------------+--+

通过分解和使用STRUCT 特征,我们可以拆分键和值。

SELECT id, exploded_input.key, exploded_input.value
FROM (
    SELECT id, exploded_input
    FROM test_table LATERAL VIEW explode(input) d AS exploded_input
) x;

+-----+------+----------------------+--+
| id  | key  |        value         |
+-----+------+----------------------+--+
| 1   | e    | ["253","203","204"]  |
| 1   | st   | ["mi"]               |
| 1   | k2   | ["1","2"]            |
+-----+------+----------------------+--+

这个想法是使用您的 UDAF 来“收集”地图,同时在 id 上进行聚合。


Hive 可以使用内置函数解决的问题是通过将行转换为带有特殊分隔符的字符串来生成 map&lt;string,string&gt;,通过另一个特殊分隔符聚合行并在分隔符上使用 str_to_map 内置函数来生成 map&lt;string, string&gt;

SELECT
    id,
    str_to_map(

        -- outputs:  e:253,203,204#st:mi#k2:1,2 with delimiters between aggregated rows
        concat_ws('#', collect_list(list_to_string)), 
        '#', -- first delimiter
        ':'  -- second delimiter
    )  mapped_output
FROM ( 
    SELECT 
        id, 
        -- outputs 3 rows: (e:253,203,203), (st:mi), (k2:1,2)
        CONCAT(exploded_input.key,':' , CONCAT_WS(',', exploded_input.value)) as list_to_string
    FROM (
        SELECT id, exploded_input
        FROM test_table LATERAL VIEW explode(input) d AS exploded_input
    ) x
) y
GROUP BY id;

将字符串输出到字符串映射,例如:

+-----+-------------------------------------------+--+
| id  |               mapped_output               |
+-----+-------------------------------------------+--+
| 1   | {"e":"253,203,204","st":"mi","k2":"1,2"}  |
+-----+-------------------------------------------+--+

【讨论】:

    【解决方案2】:
    with input_set as (
    select array(named_struct('key','e','value',array('253','203','204')),named_struct('key','st','value',array('mi')),named_struct('key','k2','value',array('1','2'))) as input_array
    ), break_input_set as (
    select y.col_num as y_col_num,y.col_value as y_col_value from input_set lateral view posexplode(input_set.input_array) y as col_num, col_value
    ), create_map as (
    select map(y_col_value.key,y_col_value.value) as final_map from break_input_set
    )
    select * from create_map;
    

    【讨论】:

      【解决方案3】:
      var Array = [{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}];
      
      var obj = {}
      for(var i=0;i<Array.length;i++){
        obj[Array[i].key] = Array[i].value
      }
      

      obj 将采用所需的格式

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2017-03-25
        • 1970-01-01
        • 1970-01-01
        • 2018-08-26
        • 2014-07-08
        • 2019-11-10
        • 2016-05-22
        • 1970-01-01
        相关资源
        最近更新 更多