【发布时间】:2020-08-13 08:02:15
【问题描述】:
我有一个 4 列的 DataFrame。我想在 2 列的基础上申请 GroupBy 并想收集其他列作为列表。
示例:- 我有一个像这样的 DF
+---+-------+--------+-----------+
|id |fName |lName |dob |
+---+-------+--------+-----------+
|1 |Akash |Sethi |23-05-1995 |
|2 |Kunal |Kapoor |14-10-1992 |
|3 |Rishabh|Verma |11-08-1994 |
|2 |Sonu |Mehrotra|14-10-1992 |
+---+-------+--------+-----------+
我想要这样的输出:-
+---+-----------+-------+--------+--------------------+
|id |dob |fname |lName |
+---+-----------+-------+--------+--------------------+
|1 |23-05-1995 |[Akash] |[Sethi] |
|2 |14-10-1992 |[Kunal, Sonu] |[Kapoor, Mehrotra] |
|3 |11-08-1994 |[Rishabh] |[Verma] |
+---+-----------+-------+--------+--------------------+
【问题讨论】:
标签: scala dataframe apache-spark group-by aggregate