【发布时间】:2015-07-22 08:01:41
【问题描述】:
在我将我的 RDD 映射到之后
((_id_1, section_id_1), (_id_1, section_id_2), (_id_2, section_3), (_id_2, section_4))
我想reduceByKey到
((_id_1, Set(section_id_1, section_id_2), (_id_2, Set(section_3, section_4)))
val collectionReduce = collection_filtered.map(item => {
val extras = item._2.get("extras")
var section_id = ""
var extras_id = ""
if (extras != null) {
val extras_parse = extras.asInstanceOf[BSONObject]
section_id = extras_parse.get("guid").toString
extras_id = extras_parse.get("id").toString
}
(extras_id, Set {section_id})
}).groupByKey().collect()
我的输出是
((_id_1, (Set(section_1), Set(section_2))), (_id_2, (Set(section_3), Set(section_4))))
我该如何解决这个问题?
【问题讨论】:
标签: scala mapreduce apache-spark