【发布时间】:2023-03-06 12:49:01
【问题描述】:
我有一个名为 exampleTable 的表,其中包含两列字符串数组。
Array<string> col1 Array<string> col2
["a", "b" , "c" ] ["x","y","z"]
["aa", "bb" , "cc" ] ["xx","yy","zz"]
我的目标是生成这样的表格
col1 col2
"a" "x"
"b" "y"
"c" "z"
"aa" "xx"
"bb" "yy"
"cc" "zz"
我想过像这样使用 LATERL VIEW:
SELECT myCol1, myCol2 FROM exampleTable
LATERAL VIEW explode(col1) myTable1 AS myCol1
LATERAL VIEW explode(col2) myTable2 AS myCol2;
但这会产生这个
col1 col2
"a" "x"
"a" "y"
"a" "z"
"a" "xx"
"a" "yy"
"a" "zz"
"b" "x"
"b" "y"
"b" "z"
"b" "xx"
"b" "yy"
"b" "zz"
"c" "x"
"c" "y"
"c" "z"
"c" "xx"
"c" "yy"
"c" "zz"
"aa" "x"
"aa" "y"
"aa" "z"
"aa" "xx"
"aa" "yy"
"aa" "zz"
"bb" "x"
"bb" "y"
"bb" "z"
"bb" "xx"
"bb" "yy"
"bb" "zz"
"cc" "x"
"cc" "y"
"cc" "z"
"cc" "xx"
"cc" "yy"
"cc" "zz"
我该如何解决这个问题? 提前致谢。
【问题讨论】:
-
你能解决这个问题吗?
-
是的,我做到了。我最终编写了自己的 UDF。
-
唯一的可能性似乎是编写一个自定义 UDF 或一个简单的自定义映射器脚本(使用 Hive 的转换功能)来做到这一点。它基本上需要 3 个数组并返回一个数组数组,其中每个子数组由相应索引处的元素组成。例如,此 UDF 在此示例中将采用 3 个元素: arg1:[title1,title2,title3] arg2:[artist1,artist2,artist3] arg3:[album1,album2,album3] 并返回 [[title1,artist1,album1] ,[title2,artist2,album2],[title3,artist3,album3]] 我们可以分解这个数组并挑选出单独的索引来得到答案