Json 数组的 Avro 模式答案

【问题标题】：Avro schema for Json arrayJson 数组的 Avro 模式
【发布时间】：2016-03-16 15:43:44
【问题描述】：

假设我有以下 json:

[
   {"id":1,"text":"some text","user_id":1},
   {"id":1,"text":"some text","user_id":2},
   ...
]

对于这个对象数组，什么是合适的 avro 架构？

【问题讨论】：

标签： json serialization avro

【解决方案1】：

[简短回答]
此对象数组的适当 avro 模式如下所示：

const type = avro.Type.forSchema({
  type: 'array',
  items: { type: 'record', fields:
   [ { name: 'id', type: 'int' },
     { name: 'text', type: 'string' },
     { name: 'user_id', type: 'int' } ]
  }
});

[长答案]
我们可以使用 Avro 通过给定的数据对象来帮助我们构建上述模式。
让我们使用 npm 包“avsc”，它是“Avro 规范的纯 JavaScript 实现”。
由于 Avro 可以推断出一个值的模式，我们可以使用以下技巧通过给定数据获取模式（不幸的是，它似乎无法显示嵌套模式，但我们可以询问两次 - 对于顶级结构（数组），然后是数组元素）：

// don't forget to install avsc
// npm install avsc
//
const avro = require('avsc');

// avro can infer a value's schema
const type = avro.Type.forValue([
   {"id":1,"text":"some text","user_id":1}
]);

const type2 = avro.Type.forValue(
   {"id":1,"text":"some text","user_id":1}
);


console.log(type.getSchema());
console.log(type2.getSchema());

输出：

{ type: 'array',
  items: { type: 'record', fields: [ [Object], [Object], [Object] ] } }
{ type: 'record',
  fields:
   [ { name: 'id', type: 'int' },
     { name: 'text', type: 'string' },
     { name: 'user_id', type: 'int' } ] }

现在让我们编写适当的模式并尝试使用它来序列化对象，然后将其反序列化！

const avro = require('avsc');
const type = avro.Type.forSchema({
  type: 'array',
  items: { type: 'record', fields:
   [ { name: 'id', type: 'int' },
     { name: 'text', type: 'string' },
     { name: 'user_id', type: 'int' } ]
  }
});
const buf = type.toBuffer([
   {"id":1,"text":"some text","user_id":1},
   {"id":1,"text":"some text","user_id":2}]); // Encoded buffer.

const val = type.fromBuffer(buf);
console.log("deserialized object: ", JSON.stringify(val, null, 4));  // pretty print deserialized result

var fs = require('fs');
var full_filename = "/tmp/avro_buf.dat";
fs.writeFile(full_filename, buf, function(err) {
    if(err) {
        return console.log(err);
    }

    console.log("The file was saved to '" + full_filename + "'");
});

输出：

deserialized object:  [
    {
        "id": 1,
        "text": "some text",
        "user_id": 1
    },
    {
        "id": 1,
        "text": "some text",
        "user_id": 2
    }
]
The file was saved to '/tmp/avro_buf.dat'

我们甚至可以享受上述练习的紧凑二进制表示：

hexdump -C /tmp/avro_buf.dat
00000000  04 02 12 73 6f 6d 65 20  74 65 78 74 02 02 12 73  |...some text...s|
00000010  6f 6d 65 20 74 65 78 74  04 00                    |ome text..|
0000001a

很好，不是吗？-)

【讨论】：

【解决方案2】：

关于您的问题，正确的架构是

{
  "name": "Name",
  "type": "array",
  "namespace": "com.hi.avro.model",
  "items": {
    "name": "NameDetails",
    "type": "record",
    "fields": [
      {
        "name": "id",
        "type": "int"
      },
      {
        "name": "text",
        "type": "string"
      },
      {
        "name": "user_id",
        "type": "int"
      }
    ]
  }
}

【讨论】：