【发布时间】:2021-01-19 05:19:57
【问题描述】:
问题是How to store custom objects in Dataset?的后续问题
Spark 版本:3.0.1
可以实现非嵌套的自定义类型:
import spark.implicits._
import org.apache.spark.sql.{Encoder, Encoders}
class AnObj(val a: Int, val b: String)
implicit val myEncoder: Encoder[AnObj] = Encoders.kryo[AnObj]
val d = spark.createDataset(Seq(new AnObj(1, "a")))
d.printSchema
root
|-- value: binary (nullable = true)
但是,如果自定义类型嵌套在 product 类型(即case class)内,则会出现错误:
java.lang.UnsupportedOperationException:找不到 InnerObj 的编码器
import spark.implicits._
import org.apache.spark.sql.{Encoder, Encoders}
class InnerObj(val a: Int, val b: String)
case class MyObj(val i: Int, val j: InnerObj)
implicit val myEncoder: Encoder[InnerObj] = Encoders.kryo[InnerObj]
// error
val d = spark.createDataset(Seq(new MyObj(1, new InnerObj(0, "a"))))
// it gives Runtime error: java.lang.UnsupportedOperationException: No Encoder found for InnerObj
我们如何创建带有嵌套自定义类型的Dataset?
【问题讨论】:
标签: apache-spark apache-spark-sql apache-spark-dataset kryo