【发布时间】:2020-08-12 23:47:15
【问题描述】:
我想编写一个嵌套数据结构,它将包含嵌套 Map 和简单值的数据帧转换为包含在数组中的单个数据帧行。
结果应该转换这个数据框:
+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value |records |
+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|123 |[USA -> [1475600496 -> 25.000000000000000000], ITA -> [1475600500 -> 18.000000000000000000, 1475600516 -> 19.000000000000000000], JPN -> [1475600508 -> 27.000000000000000000]]|
|256 |[USA -> [1475600508 -> 40.000000000000000000, 1475600500 -> 47.000000000000000000], NOR -> [1475600496 -> 30.000000000000000000]] |
|118 |[USA -> [1475600500 -> 50.000000000000000000]] |
+-------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
进入:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|valueAndRecords |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[[123, [USA -> [1475600496 -> 25.000000000000000000], ITA -> [1475600500 -> 18.000000000000000000, 1475600516 -> 19.000000000000000000], JPN -> [1475600508 -> 27.000000000000000000]], [256, [USA -> [1475600508 -> 40.000000000000000000, 1475600500 -> 47.000000000000000000], NOR -> [1475600496 -> 30.000000000000000000]]], [118, [USA -> [1475600500 -> 50.000000000000000000]]]]|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
我可以将两列与下面的行组合成一个结构,但它不会将结果包装在一个数组中。如何完成?
df.withColumn("valueAndRecords", struct("value", "records")).select("valueAndRecords")
【问题讨论】:
-
您的输入数据框中有键列吗?还是您希望 DF 中的所有行都变成一行?
-
请阅读How to ask a good question 并提供minimal reproducible example。目前尚不清楚您到底要达到什么目标以及当前的错误是什么。有错误吗?
-
@C.S.ReddyGadipally,我在数据框中的三行确实有相同的键,但是,我的示例没有提供该步骤:P。 @Shu 在
agg()中使用collect_list提供了答案
标签: scala apache-spark apache-spark-sql