【发布时间】:2016-07-12 23:37:44
【问题描述】:
我想解析一个嵌套的 avro 文件,并希望将其加载到 HIVE 表中(HIVE 表可以是嵌套表)。
我的 AVRO 架构如下所示
{
"type" : "record",
"name" : "NTTObject",
"namespace" : "com.test.ntt",
"fields" : [ {
"name" : "header",
"type" : {
"type" : "map",
"values" : {
"type" : "string",
"avro.java.string" : "String"
},
"avro.java.string" : "String"
},
"default" : { }
}, {
"name" : "body",
"type" : {
"type" : "string",
"avro.java.string" : "String"
},
"default" : ""
} ]
}
示例数据如下所示
{"objectKey":"trx/Phone/2016-05-12/15-12-18/0384bdr311-32w5b-49aa-a814-379256f80ca8"} {"StatDataRequest":{"protocolVersion":"1","platform":"Android","format":"Detailed","deviceid":"0384bdr311-32w5b-49aa-a814-379256f80ca8","stats":{"clientStat":[{"contentActionStat":{"progid":"56aa31a135d1c95d77f70b533289dfc3","gen1re":"Sports/Auto/Racing/High-Def/Events/Series/Live","rating":"0","vendor":"1 1 877U3 50B","vod":"false","ppv":"false","series":"true","title":"Test Prix, Practice","description":"\"Test Prix, Practice\"","recordDate":"2016-05-26T12:00:00Z","channel":"220","channel_name":"NBCSHD","TMSID":"ABCD5544671291","channel_minor":"0","hd":"false","contentAction":"Streaming_Started","clientMode":"UNKNOWN","timestamp":"2016-05-27T03:00:28.686Z","errorReason":"36100530"}},{"contentActionStat":{"progid":"56aa31a135d1c95d77f70b533289dfc3","gen1re":"Sports/Auto/Racing/High-Def/Events/Series/Live","rating":"0","vendor":"1 1 875E3 50B","vod":"false","ppv":"false","series":"true","title":"Test Prix, Practice","description":"\"Test Prix, Practice\"","recordDate":"2016-05-26T12:00:00Z","channel":"220","channel_name":"NBCSHD","TMSID":"ABCD5544671291","channel_minor":"0","hd":"false","contentAction":"Streaming_Stopped","clientMode":"UNKNOWN","durationSeconds":"3172","timestamp":"2016-05-27T03:53:20.077Z","errorReason":"36100530"}}]}}}
上述示例数据的预期输出(其中 PIPE (|) 我认为是列分隔符
trx/Phone/2016-05-12/15-12-18/0384bdr311-32w5b-49aa-a814-379256f80ca8|1|Android|Detailed|0384bdr311-32w5b-49aa-a814-379256f80ca8|56aa31a135d1c95d77f70b533289dfc3|Sports/Auto/Racing/High-Def/Events/Series/Live|0|1 1 877U3 50B|false|false|true|Test Prix, Practice|\"Test Prix, Practice\"|2016-05-26T12:00:00Z|220|NBCSHD|ABCD5544671291|0|false|Streaming_Started|UNKNOWN||2016-05-27T03:00:28.686Z|36100530
trx/Phone/2016-05-12/15-12-18/0384bdr311-32w5b-49aa-a814-379256f80ca8|1|Android|Detailed|0384bdr311-32w5b-49aa-a814-379256f80ca8|56aa31a135d1c95d77f70b533289dfc3|Sports/Auto/Racing/High-Def/Events/Series/Live|0|1 1 877U3 50B|false|false|true|Test Prix, Practice|\"Test Prix, Practice\"|2016-05-26T12:00:00Z|220|NBCSHD|ABCD5544671291|0|false|Streaming_Started|UNKNOWN|3172|2016-05-27T03:53:20.077Z|36100530
Java 或 Scala 中的任何小示例代码都会有所帮助
@SANN3
建议使用的代码 sn-pimport java.util.ArrayList;
import java.util.List;
import org.json.JSONArray;
import org.json.JSONObject;
public class GenieGo_AVRO_Parsing {
String jsonStr = "{\"objectKey\":\"trx/Android/2016-05-27/15-03-59/c496555a-940d-46eb-bc6a-21ae265ddf27\"} {\"StatDataRequest\":{\"protocolVersion\":\"1\",\"platform\":\"Android\",\"format\":\"Detailed\",\"deviceid\":\"c496555a-940d-46eb-bc6a-21ae265ddf27\",\"stats\":{\"clientStat\":[{\"contentActionStat\":{\"progid\":\"481080bd93a0710e496335d9acceb6add1695e7b\",\"rating\":\"0\",\"vendor\":\"1 1 11AD3C 70\",\"vod\":\"false\",\"ppv\":\"false\",\"series\":\"true\",\"title\":\"Wienerschnitzel\",\"description\":\"Wienerschnitzel CEO Cynthia Galardi-Culpepper.\",\"recordDate\":\"2016-05-23T01:00:00Z\",\"channel\":\"11\",\"channel_name\":\"WTOL\",\"TMSID\":\"EP011584600112\",\"channel_minor\":\"65535\",\"hd\":\"false\",\"contentAction\":\"Downloading_Started\",\"clientMode\":\"UNKNOWN\",\"timestamp\":\"2016-05-26T02:44:43.511Z\"}},{\"contentActionStat\":{\"progid\":\"481080bd93a0710e496335d9acceb6add1695e7b\",\"rating\":\"0\",\"vendor\":\"1 1 11AD3C 70\",\"vod\":\"false\",\"ppv\":\"false\",\"series\":\"true\",\"title\":\"Wienerschnitzel\",\"description\":\"Wienerschnitzel CEO Cynthia Galardi-Culpepper.\",\"recordDate\":\"2016-05-23T01:00:00Z\",\"channel\":\"11\",\"channel_name\":\"WTOL\",\"TMSID\":\"EP011584600112\",\"channel_minor\":\"65535\",\"hd\":\"false\",\"contentAction\":\"Downloading_Finish\",\"clientMode\":\"UNKNOWN\",\"durationSeconds\":\"263\",\"timestamp\":\"2016-05-26T02:49:06.347Z\"}},{\"contentActionStat\":{\"progid\":\"481080bd93a0710e496335d9acceb6add1695e7b\",\"rating\":\"0\",\"vendor\":\"1 1 11AD3C 70\",\"vod\":\"false\",\"ppv\":\"false\",\"series\":\"true\",\"title\":\"Wienerschnitzel\",\"description\":\"Wienerschnitzel CEO Cynthia Galardi-Culpepper.\",\"recordDate\":\"2016-05-23T01:00:00Z\",\"channel\":\"11\",\"channel_name\":\"WTOL\",\"TMSID\":\"EP011584600112\",\"channel_minor\":\"65535\",\"hd\":\"false\",\"contentAction\":\"Downloading_Cancel\",\"clientMode\":\"UNKNOWN\",\"timestamp\":\"2016-05-26T02:49:06.349Z\"}},{\"contentActionStat\":{\"progid\":\"dcb1e7d2374d0c0fa35131dda7e9228421a07668\",\"rating\":\"0\",\"vendor\":\"1 1 11AD3C 71\",\"vod\":\"false\",\"ppv\":\"false\",\"series\":\"true\",\"title\":\"Golden Krust Caribbean Bakery & Grill\",\"description\":\"Golden Krust Caribbean Bakery & Grill CEO Lowell Hawthorne.\",\"recordDate\":\"2016-05-23T02:00:00Z\",\"channel\":\"11\",\"channel_name\":\"WTOL\",\"TMSID\":\"EP011584600113\",\"channel_minor\":\"65535\",\"hd\":\"false\",\"contentAction\":\"Downloading_Started\",\"clientMode\":\"UNKNOWN\",\"timestamp\":\"2016-05-26T02:49:16.382Z\"}},{\"contentActionStat\":{\"progid\":\"dcb1e7d2374d0c0fa35131dda7e9228421a07668\",\"rating\":\"0\",\"vendor\":\"1 1 11AD3C 71\",\"vod\":\"false\",\"ppv\":\"false\",\"series\":\"true\",\"title\":\"Golden Krust Caribbean Bakery & Grill\",\"description\":\"Golden Krust Caribbean Bakery & Grill CEO Lowell Hawthorne.\",\"recordDate\":\"2016-05-23T02:00:00Z\",\"channel\":\"11\",\"channel_name\":\"WTOL\",\"TMSID\":\"EP011584600113\",\"channel_minor\":\"65535\",\"hd\":\"false\",\"contentAction\":\"Downloading_Finish\",\"clientMode\":\"UNKNOWN\",\"durationSeconds\":\"254\",\"timestamp\":\"2016-05-26T02:53:30.368Z\"}},{\"contentActionStat\":{\"progid\":\"dcb1e7d2374d0c0fa35131dda7e9228421a07668\",\"rating\":\"0\",\"vendor\":\"1 1 11AD3C 71\",\"vod\":\"false\",\"ppv\":\"false\",\"series\":\"true\",\"title\":\"Golden Krust Caribbean Bakery & Grill\",\"description\":\"Golden Krust Caribbean Bakery & Grill CEO Lowell Hawthorne.\",\"recordDate\":\"2016-05-23T02:00:00Z\",\"channel\":\"11\",\"channel_name\":\"WTOL\",\"TMSID\":\"EP011584600113\",\"channel_minor\":\"65535\",\"hd\":\"false\",\"contentAction\":\"Downloading_Cancel\",\"clientMode\":\"UNKNOWN\",\"timestamp\":\"2016-05-26T02:53:30.373Z\"}}]}}}"; //Input JSON
String json1 = jsonStr.substring(0, jsonStr.indexOf("}")+1);
String json2 = jsonStr.substring(jsonStr.indexOf("}")+1);
String out = "", header = "";
JSONObject json = new JSONObject(json1);
header = header.concat(json.getString("objectKey")).concat("|");
json = new JSONObject(json2);
JSONObject StatDataRequest = json.getJSONObject("StatDataRequest");
header = header.concat(StatDataRequest.getString("protocolVersion")).concat("|");
header = header.concat(StatDataRequest.getString("platform")).concat("|");
header = header.concat(StatDataRequest.getString("format")).concat("|");
header = header.concat(StatDataRequest.getString("deviceid")).concat("|");
JSONObject stats = StatDataRequest.getJSONObject("stats");
JSONArray clientStatArr = stats.getJSONArray("clientStat");
List<String> keyList = new ArrayList<String>();
keyList.add("progid");
keyList.add("gen1re");
keyList.add("rating");
keyList.add("vendor");
keyList.add("vod");
keyList.add("ppv");
keyList.add("series");
keyList.add("title");
keyList.add("description");
keyList.add("recordDate");
keyList.add("channel");
keyList.add("channel_name");
keyList.add("TMSID");
keyList.add("channel_minor");
keyList.add("hd");
keyList.add("contentAction");
keyList.add("clientMode");
keyList.add("timestamp");
keyList.add("errorReason");
String row;
JSONObject clientStat, contentActionStat;
for (int i = 0; i < clientStatArr.length(); i++) {
clientStat = clientStatArr.getJSONObject(i);
contentActionStat = clientStat.getJSONObject("contentActionStat");
row = "";
for (String key : keyList) {
row = row.concat(contentActionStat.getString(key)).concat("|");
}
out = out.concat(header).concat(row).concat("\n");
}
System.out.println(out);
}
}
【问题讨论】:
-
您的示例数据和架构不匹配,示例数据中没有字段标题和正文
-
我使用命令
java -jar /opt/cloudera/parcels/CDH/lib/avro/avro-tools-1.7.6-cdh5.4.7.jar getschema part-m-00000 > geniego.avsc从avro文件本身生成的架构 -
那么您的示例消息不正确