【发布时间】:2017-07-05 10:12:02
【问题描述】:
我们正在构建使用 HBase RDD 与传入 DStream 连接的简单 Streaming 应用程序。 示例代码:
val indexState = sc.newAPIHadoopRDD(
conf,
classOf[TableInputFormat],
classOf[ImmutableBytesWritable],
classOf[Result]).map { case (rowkey, v) => //some logic}
val result = dStream.transform { rdd =>
rdd.leftOuterJoin(indexState)
}
它工作正常,但是当我们为 StreamingContext 启用检查点时 并让应用程序从先前创建的检查点中恢复, 它总是抛出 NullPointerException。
ERROR streaming.StreamingContext: Error starting the context, marking it as stopped
java.lang.NullPointerException
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:119)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:120)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
有人遇到过同样的问题吗? 版本:
- Spark 1.6.x
- Hadoop 2.7.x
谢谢!
【问题讨论】:
-
当您说“以前创建的检查点”时,这是否意味着流作业已停止并重新提交?
标签: hadoop apache-spark hbase spark-streaming