【发布时间】:2014-03-05 10:15:45
【问题描述】:
我正在尝试为以下 json(用于 Hadoop)构建 Avro 架构:
{
"name_tag":"Guy",
"known_nested_structure" : {
"fieldA" : ["value1"],
"fieldB" : ["value1","value2"],
"fieldC" : [],
"fieldD" : ["value1"]
},
"another_field" : "hi"
}
我的第一个想法是这个 avro 模式(包括 hive 命令):
CREATE EXTERNAL TABLE IF NOT EXISTS record_table
PARTITIONED BY (YEAR INT, MONTH INT, DAY INT, HOUR INT)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'hdfs://localhost/data/output/records_data/hourly'
TBLPROPERTIES ('avro.schema.literal'='{
"name": "myRecord",
"type": "record",
"fields": [
{"name":"name_tag", "type":"string",c"default": ""},
{
"name": "known_nested_structure",
"type": "record",
"fields": [
{"name":"fieldA", "type":{"type":"array","items":"string"},"default":null},
{"name":"fieldB", "type":{"type":"array","items":"string"},"default":null},
{"name":"fieldC", "type":{"type":"array","items":"string"},"default":null},
{"name":"fieldD", "type":{"type":"array","items":"string"},"default":null}
],
"default":null
},
{"name": "another_field","type":"string","default": ""}
]
}');
命令的hive结果: 好的 来自反序列化器的 error_error_error_error_error_error_error 字符串 来自反序列化器的 cannot_determine_schema 字符串 检查来自反序列化器的字符串 来自反序列化器的架构字符串 来自反序列化器的 url 字符串 和来自反序列化器的字符串 来自反序列化器的文字字符串 年份 月份整数 日整数 小时整数 耗时:0.128 秒
但由于某种原因,这是有效的 avro 架构。
{
"name": "myRecord",
"type": "record",
"fields": [
{"name":"name_tag", "type":"string","default": null},
{
"name": "known_nested_structure",
"type": {
"name": "known_nested_structure",
"type": "record",
"fields": [
{"name":"fieldA", "type":{"type":"array","items":"string"},"default":null},
{"name":"fieldB", "type":{"type":"array","items":"string"},"default":null},
{"name":"fieldC", "type":{"type":"array","items":"string"},"default":null},
{"name":"fieldD", "type":{"type":"array","items":"string"},"default":null}
],
"default":null
}
},
{"name": "another_field","type": "string","default": null}
]
}
结果:
OK
name_tag string from deserializer
known_nested_structure struct<fielda:array<string>,fieldb:array<string>,fieldc:array<string>,fieldd:array<string>> from deserializer
another_field string from deserializer
year int
month int
day int
hour int
Time taken: 0.123 seconds
第一个 avro 架构不起作用的原因是什么?为什么我不能将记录直接作为字段(known_nested_structure 在我的第二个架构示例中的 known_nested_structure 中)?
谢谢,
男人
【问题讨论】: