mapPartitionsWithIndex
def mapPartitionsWithIndex[U](f: (Int, Iterator[T]) => Iterator[U], preservesPartitioning: Boolean = false)(implicit arg0: ClassTag[U]): RDD[U]

函数作用同mapPartitions,不过提供了分区的索引(代码中partid)。

val rdd = sc.parallelize(1 to 8,3)
rdd.mapPartitionsWithIndex{
(partid,iter)=>{
var part_map = scala.collection.mutable.Map[String,List[Int]]()
var part_name = "part_" + partid
part_map(part_name) = List[Int]()
while(iter.hasNext){
part_map(part_name) :+= iter.next()//:+= 列表尾部追加元素
}
part_map.iterator
}
}.collect

 

OUTPUT  

res0: Array[(String, List[Int])] = Array((part_0,List(1, 2)), (part_1,List(3, 4, 5)), (part_2,List(6, 7, 8)))

 

转自:https://blog.csdn.net/jasonwang_/article/details/80369222

相关文章:

  • 2021-07-23
  • 2021-12-01
  • 2021-09-07
  • 2021-11-14
  • 2021-04-23
  • 2022-01-19
  • 2021-11-21
猜你喜欢
  • 2021-09-29
  • 2021-04-28
  • 2021-12-08
  • 2021-03-28
  • 2021-04-30
  • 2021-07-10
相关资源
相似解决方案