有了这个文件/tmp/test.avsc
{
"type": "record",
"namespace": "com.example",
"name": "FullName",
"fields": [
{ "name": "first", "type": "string" },
{ "name": "last", "type": "string" }
]
}
和这样的数据框:
df = spark.createDataFrame([{"first": "john", "last": "parker" }], StructType([StructField("first", StringType()), StructField("last", StringType())]))
导致:
+-----+------+
|first| last|
+-----+------+
| john|parker|
+-----+------+
您可以这样做来强制写入架构:
jsonFormatSchema = open("/tmp/test.avsc", "r").read()
df.write.format("avro").options(avroSchema=jsonFormatSchema).save("/tmp/avro")
与强制读取模式类似:
spark.read.format('avro').options(avroSchema=jsonFormatSchema).load("/tmp/avro")
更多信息可在此处获得,顺便说一句,这里有足够多的 python 示例:https://spark.apache.org/docs/latest/sql-data-sources-avro.html