【发布时间】:2016-10-06 16:41:25
【问题描述】:
我正在尝试使用 zeppelin 笔记本获取 data from DynamoDB with Apache Spark 来构建快速报告
计数运行良好,但除此之外我无法运行任何类似
orders.take(1).foreach(println)
失败并出现以下错误:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 5.0 (TID 5) had a not serializable result: org.apache.hadoop.io.Text
Serialization stack:
- object not serializable (class: org.apache.hadoop.io.Text, value: )
- field (class: scala.Tuple2, name: _1, type: class java.lang.Object)
- object (class scala.Tuple2, (,{<<A rec from DynamoDB as JSON>>}))
- element of array (index: 0)
- array (class [Lscala.Tuple2;, size 7)
如何解决这个问题?我尝试对结果进行类型转换,但失败了:
asInstanceOf[Tuple2[Text, DynamoDBItemWritable]
过滤器也是如此
orders.filter(_._1 != null)
我打算将其转换为 DataFrame 以将其注册为临时表。然后我计划对此运行临时查询。
【问题讨论】:
标签: scala amazon-web-services apache-spark amazon-dynamodb apache-zeppelin