【问题标题】:How to Avro Binary encode the JSON String using Apache Avro?如何使用 Apache Avro 对 JSON 字符串进行 Avro 二进制编码?
【发布时间】:2014-03-25 12:31:38
【问题描述】:

我正在尝试对我的 JSON 字符串进行 avro 二进制编码。下面是我的 JSON 字符串,我创建了一个简单的方法来进行转换,但我不确定我的做法是否正确?

public static void main(String args[]) throws Exception{
try{
    Schema schema = new Parser().parse((TestExample.class.getResourceAsStream("/3233.avsc")));
    String json="{"+
        "  \"location\" : {"+
        "    \"devices\":["+
        "      {"+
        "        \"did\":\"9abd09-439bcd-629a8f\","+
        "        \"dt\":\"browser\","+
        "        \"usl\":{"+
        "          \"pos\":{"+
        "            \"source\":\"GPS\","+
        "            \"lat\":90.0,"+
        "            \"long\":101.0,"+
        "            \"acc\":100"+
        "          },"+
        "          \"addSource\":\"LL\","+
        "          \"add\":["+
        "            {"+
        "              \"val\":\"2123\","+
        "              \"type\" : \"NUM\""+
        "            },"+
        "            {"+
        "              \"val\":\"Harris ST\","+
        "              \"type\" : \"ST\""+
        "            }"+
        "          ],"+
        "          \"ei\":{"+
        "            \"ibm\":true,"+
        "            \"sr\":10,"+
        "            \"ienz\":true,"+
        "            \"enz\":100,"+
        "            \"enr\":10"+
        "          },"+
        "          \"lm\":1390598086120"+
        "        }"+
        "      }"+
        "    ],"+
        "    \"ver\" : \"1.0\""+
        "  }"+
        "}";

    byte[] avroByteArray = fromJsonToAvro(json,schema);

} catch (Exception ex) {
    // log an exception
}

以下方法会将我的 JSON 字符串转换为 Avro 二进制编码 -

private static byte[] fromJsonToAvro(String json, Schema schema) throws Exception {

    InputStream input = new ByteArrayInputStream(json.getBytes());
    DataInputStream din = new DataInputStream(input);   

    Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);

    DatumReader<Object> reader = new GenericDatumReader<Object>(schema);
    Object datum = reader.read(null, decoder);


    GenericDatumWriter<Object>  w = new GenericDatumWriter<Object>(schema);
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

    Encoder e = EncoderFactory.get().binaryEncoder(outputStream, null);

    w.write(datum, e);
    e.flush();

    return outputStream.toByteArray();
}

谁能看一下,让我知道我尝试对 JSON 字符串进行二进制处理的方式是否正确?

【问题讨论】:

  • 物有所值,Apache Avro spec
  • 不清楚将 JSON“转换为 Avro”意味着什么,因为根据规范,Avro 表示法只是对 JSON 字符串格式的一组特定约束。
  • 无论如何,Apache 似乎提供了set of utilities,所以不清楚你为什么需要自己编写。
  • hmmm.. 不确定我理解正确.. 我有一个 JSON 字符串,我应该将它编码成 Avro 二进制.. 我应该怎么做?我这样做的方式不正确?
  • @HotLicks 我发现自己遇到了同样的问题,您介意指出在哪里可以找到这些 Apache 实用程序以找到等效的功能/方法吗?

标签: java json binary bytearray avro


【解决方案1】:

当您知道 json 文件的架构({schema_file}.avsc)时,您可以使用avro-tools 将 json 文件({input_file}.json)转换为 avro 文件({output_file}.avro)。如下所示:

java -jar the/path/of/avro-tools-1.8.1.jar fromjson {input_file}.json   --schema-file {schema_file}.avsc > {output_file}.avro

顺便说一下,{schema_file}.avsc文件的内容如下:

{"type": "record",
 "name": "User",
  "fields": [
      {"name": "name", "type": "string"},
      {"name": "favorite_number",  "type": ["int", "null"]},
      {"name": "favorite_color", "type": ["string", "null"]}
  ]
 }

Download avro-tools-1.8.1

Download others avro-tools

【讨论】:

    【解决方案2】:

    为了补充 Keegan 的回答,这个讨论可能很有用:

    http://mail-archives.apache.org/mod_mbox/avro-user/201209.mbox/%3CCALEq1Z8s1sfaAVB7YE2rpZ=v3q1V_h7Vm39h0HsOzxJ+qfQRSg@mail.gmail.com%3E

    要点是有一个特殊的 Json 模式,您可以使用 JsonReader/Writer 来访问和访问它。您应该使用的 Json 模式在此处定义:

    https://github.com/apache/avro/blob/trunk/share/schemas/org/apache/avro/data/Json.avsc

    【讨论】:

      【解决方案3】:

      我认为 OP 是正确的。如果这是一个 Avro 数据文件,这将在没有架构的情况下自行写入 Avro 记录。

      这是 Avro 本身的几个示例(如果您正在处理文件,这很有用。
      • 从 JSON 到 Avro:DataFileWriteTool
      • 从 Avro 到 JSON:DataFileReadTool

      这是一个双向的完整示例。

      @Grapes([
          @Grab(group='org.apache.avro', module='avro', version='1.7.7')
      ])
      
      import java.io.ByteArrayInputStream;
      import java.io.ByteArrayOutputStream;
      import java.io.DataInputStream;
      import java.io.EOFException;
      import java.io.IOException;
      import java.io.InputStream;
      
      import org.apache.avro.Schema;
      import org.apache.avro.generic.GenericDatumReader;
      import org.apache.avro.generic.GenericDatumWriter;
      import org.apache.avro.generic.GenericRecord;
      import org.apache.avro.io.DatumReader;
      import org.apache.avro.io.DatumWriter;
      import org.apache.avro.io.Decoder;
      import org.apache.avro.io.DecoderFactory;
      import org.apache.avro.io.Encoder;
      import org.apache.avro.io.EncoderFactory;
      import org.apache.avro.io.JsonEncoder;
      
      String schema = '''{
        "type":"record",
        "namespace":"foo",
        "name":"Person",
        "fields":[
          {
            "name":"name",
            "type":"string"
          },
          {
            "name":"age",
            "type":"int"
          }
        ]
      }'''
      String json = "{" +
        "\"name\":\"Frank\"," +
        "\"age\":47" +
      "}"
      
      assert avroToJson(jsonToAvro(json, schema), schema) == json
      
      
      public static byte[] jsonToAvro(String json, String schemaStr) throws IOException {
          InputStream input = null;
          GenericDatumWriter<GenericRecord> writer = null;
          Encoder encoder = null;
          ByteArrayOutputStream output = null;
          try {
              Schema schema = new Schema.Parser().parse(schemaStr);
              DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema);
              input = new ByteArrayInputStream(json.getBytes());
              output = new ByteArrayOutputStream();
              DataInputStream din = new DataInputStream(input);
              writer = new GenericDatumWriter<GenericRecord>(schema);
              Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
              encoder = EncoderFactory.get().binaryEncoder(output, null);
              GenericRecord datum;
              while (true) {
                  try {
                      datum = reader.read(null, decoder);
                  } catch (EOFException eofe) {
                      break;
                  }
                  writer.write(datum, encoder);
              }
              encoder.flush();
              return output.toByteArray();
          } finally {
              try { input.close(); } catch (Exception e) { }
          }
      }
      
      public static String avroToJson(byte[] avro, String schemaStr) throws IOException {
          boolean pretty = false;
          GenericDatumReader<GenericRecord> reader = null;
          JsonEncoder encoder = null;
          ByteArrayOutputStream output = null;
          try {
              Schema schema = new Schema.Parser().parse(schemaStr);
              reader = new GenericDatumReader<GenericRecord>(schema);
              InputStream input = new ByteArrayInputStream(avro);
              output = new ByteArrayOutputStream();
              DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
              encoder = EncoderFactory.get().jsonEncoder(schema, output, pretty);
              Decoder decoder = DecoderFactory.get().binaryDecoder(input, null);
              GenericRecord datum;
              while (true) {
                  try {
                      datum = reader.read(null, decoder);
                  } catch (EOFException eofe) {
                      break;
                  }
                  writer.write(datum, encoder);
              }
              encoder.flush();
              output.flush();
              return new String(output.toByteArray());
          } finally {
              try { if (output != null) output.close(); } catch (Exception e) { }
          }
      }
      

      为了完整起见,这里有一个示例,如果您使用的是流(Avro 将这些称为container files)而不是记录。请注意,当您从 JSON 返回到 Avro 时,您不需要传递架构。这是因为它存在于流中。

      @Grapes([
          @Grab(group='org.apache.avro', module='avro', version='1.7.7')
      ])
      
      // writes Avro as a http://avro.apache.org/docs/current/spec.html#Object+Container+Files rather than a sequence of records
      
      import java.io.ByteArrayInputStream;
      import java.io.ByteArrayOutputStream;
      import java.io.DataInputStream;
      import java.io.EOFException;
      import java.io.IOException;
      import java.io.InputStream;
      
      import org.apache.avro.Schema;
      import org.apache.avro.file.DataFileStream;
      import org.apache.avro.file.DataFileWriter;
      import org.apache.avro.generic.GenericDatumReader;
      import org.apache.avro.generic.GenericDatumWriter;
      import org.apache.avro.generic.GenericRecord;
      import org.apache.avro.io.DatumReader;
      import org.apache.avro.io.DatumWriter;
      import org.apache.avro.io.Decoder;
      import org.apache.avro.io.DecoderFactory;
      import org.apache.avro.io.Encoder;
      import org.apache.avro.io.EncoderFactory;
      import org.apache.avro.io.JsonEncoder;
      
      
      String schema = '''{
        "type":"record",
        "namespace":"foo",
        "name":"Person",
        "fields":[
          {
            "name":"name",
            "type":"string"
          },
          {
            "name":"age",
            "type":"int"
          }
        ]
      }'''
      String json = "{" +
        "\"name\":\"Frank\"," +
        "\"age\":47" +
      "}"
      
      assert avroToJson(jsonToAvro(json, schema)) == json
      
      
      public static byte[] jsonToAvro(String json, String schemaStr) throws IOException {
          InputStream input = null;
          DataFileWriter<GenericRecord> writer = null;
          Encoder encoder = null;
          ByteArrayOutputStream output = null;
          try {
              Schema schema = new Schema.Parser().parse(schemaStr);
              DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema);
              input = new ByteArrayInputStream(json.getBytes());
              output = new ByteArrayOutputStream();
              DataInputStream din = new DataInputStream(input);
              writer = new DataFileWriter<GenericRecord>(new GenericDatumWriter<GenericRecord>());
              writer.create(schema, output);
              Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);
              GenericRecord datum;
              while (true) {
                  try {
                      datum = reader.read(null, decoder);
                  } catch (EOFException eofe) {
                      break;
                  }
                  writer.append(datum);
              }
              writer.flush();
              return output.toByteArray();
          } finally {
              try { input.close(); } catch (Exception e) { }
          }
      }
      
      public static String avroToJson(byte[] avro) throws IOException {
          boolean pretty = false;
          GenericDatumReader<GenericRecord> reader = null;
          JsonEncoder encoder = null;
          ByteArrayOutputStream output = null;
          try {
              reader = new GenericDatumReader<GenericRecord>();
              InputStream input = new ByteArrayInputStream(avro);
              DataFileStream<GenericRecord> streamReader = new DataFileStream<GenericRecord>(input, reader);
              output = new ByteArrayOutputStream();
              Schema schema = streamReader.getSchema();
              DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
              encoder = EncoderFactory.get().jsonEncoder(schema, output, pretty);
              for (GenericRecord datum : streamReader) {
                  writer.write(datum, encoder);
              }
              encoder.flush();
              output.flush();
              return new String(output.toByteArray());
          } finally {
              try { if (output != null) output.close(); } catch (Exception e) { }
          }
      }
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2012-01-08
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-04-24
        • 2012-08-23
        相关资源
        最近更新 更多