【发布时间】:2019-05-05 20:47:14
【问题描述】:
我正在使用 databricks spark-avro 将数据帧架构转换为 avro 架构。返回的 avro 架构没有默认值。当我尝试从架构中创建通用记录时,这会导致问题。谁能帮忙看看这个功能的正确使用方法?
Dataset<Row> sellableDs = sparkSession.sql("sql query");
SchemaBuilder.RecordBuilder<Schema> rb = SchemaBuilder.record("testrecord").namespace("test_namespace");
Schema sc = SchemaConverters.convertStructToAvro(sellableDs.schema(), rb, "test_namespace");
System.out.println(sc.toString());
System.out.println(sc.getFields().get(0).toString());
String schemaString = sc.toString();
sellableDs.foreach(
(ForeachFunction<Row>) row -> {
Schema scEx = new Schema.Parser().parse(schemaString);
GenericRecord gr;
gr = new GenericData.Record(scEx);
System.out.println("Generic record Created");
int fieldSize = scEx.getFields().size();
for (int i = 0; i < fieldSize; i++ ) {
// System.out.println( row.get(i).toString());
System.out.println("field: " + scEx.getFields().get(i).toString() + "::" + "value:" + row.get(i));
gr.put(scEx.getFields().get(i).toString(), row.get(i));
//i++;
}
}
);
这是 df 架构:
StructType(StructField(key,IntegerType,true), StructField(value,DoubleType,true))
这是 avro 转换后的架构:
{"type":"record","name":"testrecord","namespace":"test_namespace","fields":[{"name":"key","type":["int","null"]},{"name":"value","type":["double","null"]}]}
【问题讨论】:
标签: apache-spark-sql schema avro databricks spark-avro