【问题标题】:Avro schema evolutionAvro 模式演变
【发布时间】:2013-02-27 08:00:04
【问题描述】:

我有两个问题:

  1. 是否可以使用相同的读取器并解析使用两种兼容模式编写的记录,例如Schema V2Schema V1 相比,只有一个额外的可选字段,我希望读者理解这两者?我认为这里的答案是否定的,但如果是,我该怎么做?

  2. 我尝试使用Schema V1 写入记录并使用Schema V2 读取记录,但出现以下错误:

    org.apache.avro.AvroTypeException: 找到 foo,期待 foo

我使用了 avro-1.7.3 并且:

   writer = new GenericDatumWriter<GenericData.Record>(SchemaV1);
   reader = new GenericDatumReader<GenericData.Record>(SchemaV2, SchemaV1);

以下是这两种模式的示例(我也尝试过添加命名空间,但没有成功)。

架构 V1:

{
"name": "foo",
"type": "record",
"fields": [{
    "name": "products",
    "type": {
        "type": "array",
        "items": {
            "name": "product",
            "type": "record",
            "fields": [{
                "name": "a1",
                "type": "string"
            }, {
                "name": "a2",
                "type": {"type": "fixed", "name": "a3", "size": 1}
            }, {
                "name": "a4",
                "type": "int"
            }, {
                "name": "a5",
                "type": "int"
            }]
        }
    }
}]
}

架构 V2:

{
"name": "foo",
"type": "record",
"fields": [{
    "name": "products",
    "type": {
        "type": "array",
        "items": {
            "name": "product",
            "type": "record",
            "fields": [{
                "name": "a1",
                "type": "string"
            }, {
                "name": "a2",
                "type": {"type": "fixed", "name": "a3", "size": 1}
            }, {
                "name": "a4",
                "type": "int"
            }, {
                "name": "a5",
                "type": "int"
            }]
        }
    }
},
{
            "name": "purchases",
            "type": ["null",{
                    "type": "array",
                    "items": {
                            "name": "purchase",
                            "type": "record",
                            "fields": [{
                                    "name": "a1",
                                    "type": "int"
                            }, {
                                    "name": "a2",
                                    "type": "int"
                            }]
                    }
            }]
}]
} 

提前致谢。

【问题讨论】:

    标签: avro


    【解决方案1】:

    我遇到了同样的问题。这可能是 avro 的错误,但您可能可以通过在“purchase”字段中添加“default”:null 来解决。

    查看我的博客了解详情:http://ben-tech.blogspot.com/2013/05/avro-schema-evolution.html

    【讨论】:

    • 使用模式演化时必须使用默认值。如果您没有为读取器架构中存在的字段提供默认值,但写入器架构中没有,Avro 无法弄清楚如何在解析的结构中创建这个新字段。
    【解决方案2】:

    你可以做相反的事情。意味着您可以解析数据模式 1 并从模式 2 写入数据。因为在写入时它将数据写入文件,如果我们在读取时不提供任何字段,那就没问题了。但是如果我们写的字段比读取的少,它在读取时不会识别额外的字段,所以会报错。

    【讨论】:

      【解决方案3】:

      最好的方法是使用模式映射来维护模式,如 Confluent Avro 模式注册表。

      关键要点:

      1.  Unlike Thrift, avro serialized objects do not hold any schema.
      2.  As there is no schema stored in the serialized byte array, one has to provide the schema with which it was written.
      3.  Confluent Schema Registry provides a service to maintain schema versions.
      4.  Confluent provides Cached Schema Client, which checks in cache first before sending the request over the network.
      5.  Json Schema present in “avsc” file is different from the schema present in Avro Object.
      6.  All Avro objects extends from Generic Record
      7.  During Serialization : based on schema of the Avro Object a schema Id is requested from the Confluent Schema Registry.
      8.  The schemaId which is a INTEGER is converted to Bytes and prepend to serialized AvroObject.
      9.  During Deserialization : First 4 bytes are removed from the ByteArray.  4 bytes are converted back to INTEGER(SchemaId)
      10. Schema is requested from the Confluent Schema Registry and using this schema the byteArray is deserialized.
      

      http://bytepadding.com/big-data/spark/avro/avro-serialization-de-serialization-using-confluent-schema-registry/

      【讨论】:

        猜你喜欢
        • 2018-01-26
        • 2021-07-18
        • 2019-09-24
        • 2020-12-18
        • 2018-10-25
        • 2022-01-09
        • 2017-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多