【发布时间】:2017-08-01 14:19:02
【问题描述】:
如果我有这样的案例类:
Person(name:String = null, rank:Integer = null)
我有一个dataset: Dataset[Person]
假设数据集有 5 个人物对象:
Dataset[ Person(name = "Jack",id = 100, rank = null),
Person(name = "Mary",id = 400, rank = null),
Person(name = "Tom",id = 199, rank = null),
Person(name = "Linda", id = 55, rank = null),
Person(name = "Wendy", id = 30, rank = null)]
在按 id 对数据集进行排序后,我想在 Scala 中填充排名字段。这样数据集就变成了:
Dataset[ Person(name = "Wendy", id = 30, rank = 1),
Person(name = "Linda", id = 55, rank = 2),
Person(name = "Jack", id = 100, rank = 3),
Person(name = "Tom", id = 199, rank = 4),
Person(name = "Mary", id = 400, rank = 5)]
提前致谢!
【问题讨论】:
-
排名的规则是什么?你能以某种方式订购数据集吗?否则我认为这没有多大意义。
-
嗨@RaphaelRoth 感谢您的反馈。是的,这将是按字段排序,比方说按 Person.Id 字段排序。
标签: scala apache-spark apache-spark-sql