【问题标题】:Creating an Avro schema for a simple json为简单的 json 创建 Avro 模式
【发布时间】:2014-03-04 09:54:01
【问题描述】:

我正在尝试为以下 json 构建 avro 架构:

{
  "id":1234,
  "my_name_field": "my_name",
  "extra_data": {
      "my_long_value": 1234567890,
      "my_message_string": "Hello World!",
      "my_int_value":  777,
      "some_new_field": 1
  }
}

“id”和“my_name_field”的值是已知的,但“extra_data”中的字段会动态变化且未知。

我想到的 avro 架构是:

{
    "name":"my_record",
    "type":"record",
    "fields":[
        {"name":"id", "type":"int", "default":0},
        {"name":"my_name_field", "type":"string", "default":"NoName"},
        { "name":"extra_data", "type":{"type":"map", "values":["null","int","long","string"]}     }        
    ]
}

我的第一个想法是用地图制作“extra_data”记录,但这不起作用:

{ "name":"extra_data", "type":{"type":"map", "values":["null","int","long","string"]} }

我明白了:

AvroTypeException: Expected start-union. Got VALUE_NUMBER_INT

apache 在https://cwiki.apache.org/confluence/display/Hive/AvroSerDe 中提供了一些很好的示例,但似乎没有一个可以完成这项工作。

这是我运行检查的单元测试:

公共类 AvroTest {

@Test
public void readRecord() throws IOException {

    String event="{\"id\":1234,\"my_name_field\":\"my_name\",\"extra_data\":{\"my_long_value\":1234567890,\"my_message_string\":\"Hello World!\",\"my_int_value\":777,\"some_new_field\":1}}";

    SchemaRegistry<Schema> registry = new com.linkedin.camus.schema.MySchemaRegistry();
    DecoderFactory decoderFactory = DecoderFactory.get();

    ObjectMapper mapper = new ObjectMapper();
    GenericDatumReader<GenericData.Record> reader = new GenericDatumReader<GenericData.Record>();
    Schema schema = registry.getLatestSchemaByTopic("record_topic").getSchema();
    reader.setSchema(schema);

    HashMap hashMap = mapper.readValue(event, HashMap.class);
    long now = Long.valueOf(hashMap.get("now").toString())*1000;
    GenericData.Record read = reader.read(null, decoderFactory.jsonDecoder(schema, event));
}

不胜感激, 谢谢。

【问题讨论】:

    标签: json hadoop hive avro


    【解决方案1】:

    如果额外数据字段的列表确实未知,则使用多个可选值字段可能会有所帮助,如下所示:

    {
        "name":"my_record",
        "type":"record",
        "fields":[
            {"name":"id", "type":"int", "default":0},
            {"name":"my_name_field", "type":"string", "default":"NoName"},
            {"name":"extra_data", "type": "array", "items": {
                {"name": "extra_data_entry", "type":"record", "fields": [
                    {"name":"extra_data_field_name", "type": "string"},
                    {"name":"extra_data_field_type", "type": "string"},
                    {"name":"extra_data_field_value_string", "type": ["null", "string"]},
                    {"name":"extra_data_field_value_int", "type": ["null", "int"]},
                    {"name":"extra_data_field_value_long", "type": ["null", "long"]}
                ]}
            }}
        ]
    }
    

    然后您可以根据该字段的extra_data_field_type 选择extra_data_field_value_* 值。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2016-08-09
      • 1970-01-01
      • 2023-01-27
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多