这里是如何使用 groupBy 一些聚合和 toJSON 来做到这一点
val resultDf = df.groupBy("id", "identifier")
.agg(collect_list(struct("actual_cost", "cost_incurred", "timestamp")) as "cost")
.toJSON
resultDf.show(false)
结果:
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"id":2,"identifier":"xyz987","cost":[{"actual_cost":12,"cost_incurred":34,"timestamp":"2021-04-16T19:25:27"},{"actual_cost":92,"cost_incurred":87,"timestamp":"2021-04-16T19:32:43"}]}|
|{"id":1,"identifier":"abc123","cost":[{"actual_cost":24,"cost_incurred":21,"timestamp":"2021-04-16T19:07:00"},{"actual_cost":37,"cost_incurred":39,"timestamp":"2021-04-16T19:26:30"}]}|
|{"id":3,"identifier":"abc567","cost":[{"actual_cost":87,"cost_incurred":85,"timestamp":"2021-04-16T19:13:00"}]} |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
如果你想在一行那么
result.agg(to_json(collect_list(struct(result.columns.map(col): _*))).as("hits"))
.show(false)
结果:
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|hits |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[{"id":2,"identifier":"xyz987","cost":[{"actual_cost":12,"cost_incurred":34,"timestamp":"2021-04-16T19:25:27"},{"actual_cost":92,"cost_incurred":87,"timestamp":"2021-04-16T19:32:43"}]},{"id":1,"identifier":"abc123","cost":[{"actual_cost":24,"cost_incurred":21,"timestamp":"2021-04-16T19:07:00"},{"actual_cost":37,"cost_incurred":39,"timestamp":"2021-04-16T19:26:30"}]},{"id":3,"identifier":"abc567","cost":[{"actual_cost":87,"cost_incurred":85,"timestamp":"2021-04-16T19:13:00"}]}]|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+