【发布时间】:2017-07-13 07:30:39
【问题描述】:
我们有 kakfa hdfs 连接器以默认 avro 格式写入 hdfs。示例 o/p:
Obj^A^B^Vavro.schema"["null","string"]^@$ͳø{¾Ã^X:uV^K^H5^F°^F^B^B{"severity":"notice","message":"测试消息","facility":"kern","syslog-tag":"sawmill_test:","timestamp":"2017-01- 31T20:15:00+00:00"}^B^B{"severity":"notice","message":"Test message","facility":"kern","syslog-tag": "sawmill_test:","timestamp":"2017-01-31T20:15:00+00:00"}^B^B{"severity":"notice","message":"测试消息", "facility":"kern","syslog-tag":"sawmill_test:","timestamp":"2017-01-31T20:15:00+00:00"}$ͳø{¾Ã^X:uV^K^H5
尝试阅读使用:
import com.databricks.spark.avro._
val df = spark.read.avro("..path to avro file")
我们得到以下错误
java.lang.RuntimeException:Avro 架构无法转换为 Spark SQL StructType: [“空”,“字符串”] 在 com.databricks.spark.avro.DefaultSource.inferSchema(DefaultSource.scala:93) 在 org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184) 在 org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184) 在 scala.Option.orElse(Option.scala:289) 在 org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183) 在 org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387) 在 org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152) 在 org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135) 在 com.databricks.spark.avro.package$AvroDataFrameReader$$anonfun$avro$2.apply(package.scala:34) 在 com.databricks.spark.avro.package$AvroDataFrameReader$$anonfun$avro$2.apply(package.scala:34)
请帮忙
Spark 版本:2.11
Spark-avro 版本:2.11-3.2.0
kafka 版本:0.10.2.1
【问题讨论】:
标签: avro apache-kafka-connect databricks spark-avro