【发布时间】:2021-09-07 09:00:20
【问题描述】:
我想从下面嵌套的 json 文件中解析数据,但是 json 中的“keys”太多,很难解析数据
{
"jobname": {
"keys": {
"jobid":"E000295",
"car":"BMW"
},
"property":{
"doctype":"File",
"areadesc":[
{
"areaid":"qaz",
"weather":"hot",
},
{
"areaid":"wsx",
"weather":"code",
},
{
"areaid":"edc",
"weather":"hot",
},
{
"areaid":"rfv",
"weather":"hot",
}
]
},
"toolJobs":[
{
"keys":{
"toolid":"123"
},
"reports":[
{
"keys":{
"oiltype":"a",
"oilcountry":"us"
},
"property":{"reportid":"001"},
"datas":[
{
"keys":{"areaid":"qaz"},
"data":[
{
"time": "2021-01-01",
"value": 1
},
{
"time": "2021-01-02",
"value": 3
},
]
},
{
"keys":{"areaid":"wsx"},
"data":[
{
"time": "2021-01-03",
"value": 5
},
{
"time": "2021-01-04",
"value": 7
},
]
},
]
},
{
"keys":{
"oiltype":"b",
"oilcountry":"china"
},
"property":{"reportid":"002"},
"datas":[
{
"keys":{"areaid":"edc"},
"data":[
{
"time": "2021-01-05",
"value": 2
},
{
"time": "2021-01-06",
"value": 4
},
]
},
{
"keys":{"areaid":"rfv"},
"data":[
{
"time": "2021-01-07",
"value": 6
},
{
"time": "2021-01-08",
"value": 8
},
]
},
]
}
]
}
]
}
}
到目前为止,我可以使用下面的代码得到基本结果,但是有些列没有,例如oiltype,oilcountry,reportid,areaid
cat tmp1.json | jq -cn --stream '
[fromstream(
1|truncate_stream(inputs)
| (.[0][:2] | index("keys")) as $ix
| if $ix then .[0] |= .[1+$ix:]
else (.[0] | index("toolJobs")) as $iy | (.[0][$iy:$iy+3] | index("keys")) as $iz
| if $iz then .[0] |= .[1+$iy+$iz:]
else (.[0] | index("data")) as $ik
| if $ik then .[0] |= .[$ik:]
else empty
end
end
end
)] | .[0] as $header | .[1] as $tool | [.[2:][] | ($header+ $tool+.)] | .'
结果是
[ {"jobid":"E000295","car":"BMW","toolid":"123","data":[{"time":"2021-01-01","value":1}, {“时间”:“2021-01-02”,“价值”:3}]}, {"jobid":"E000295","car":"BMW","toolid":"123","data":[{"time":"2021-01-03","value":5}, {“时间”:“2021-01-04”,“价值”:7}]}, {"jobid":"E000295","car":"BMW","toolid":"123","data":[{"time":"2021-01-05","value":2}, {“时间”:“2021-01-06”,“价值”:4}]}, {"jobid":"E000295","car":"BMW","toolid":"123","data":[{"time":"2021-01-07","value":6}, {"time":"2021-01-08","value":8}]}]
我也试试下面的代码
cat tmp1.json | jq -cn --stream '
[fromstream(
1|truncate_stream(inputs)
| (.[0][:2] | index("keys")) as $ix
| if $ix then .[0] |= .[1+$ix:]
else (.[0] | index("toolJobs")) as $iy | (.[0][$iy:$iy+3] | index("keys")) as $iz
| if $iz then .[0] |= .[1+$iy+$iz:]
else (.[0] | index("data")) as $ik
| if $ik then .[0] |= .[$ik:]
else (.[0] | index("reports")) as $iw | (.[0][$iw:$iw+3] | index("property")) as $ii
| if $ii then (.[0] |= .[$iw+$ii:])
else (.[0] | index("keys")) as $ij
| if $ij then (.[0] |= .[$ij:])
else empty
end
end
end
end
end
)] | .[0] as $header | .[1] as $prjob | [.[2:][] | ($header + $prjob + .)] | .'
但结果很奇怪
[
{"jobid":"E000295","car":"BMW","property":{"reportid":"001"},"toolid":"123","keys":{"oiltype" :"a","oilcountry":"us","areaid":"qaz"},"data":[{"time":"2021-01-01","value":1},{"time ":"2021-01-02","值":3}]},
{"jobid":"E000295","car":"BMW","property":{"doctype":"File","areadesc":[{"areaid":"qaz","weather" :"hot"},{"areaid":"wsx","weather":"code"},{"areaid":"edc","weather":"hot"},{"areaid":"rfv" ,"天气":"hot"}]},"toolid":"123","keys":{"areaid":"wsx"},"data":[{"time":"2021-01-03 ","value":5},{"time":"2021-01-04","value":7}]},
{"jobid":"E000295","car":"BMW","property":{"reportid":"002"},"toolid":"123","keys":{"oiltype" :"b","oilcountry":"china","areaid":"edc"},"data":[{"time":"2021-01-05","value":2},{"time ":"2021-01-06","value":4}]},
{"jobid":"E000295","car":"BMW","property":{"doctype":"File","areadesc":[{"areaid":"qaz","weather" :"hot"},{"areaid":"wsx","weather":"code"},{"areaid":"edc","weather":"hot"},{"areaid":"rfv" ,"天气":"hot"}]},"toolid":"123","keys":{"areaid":"rfv"},"data":[{"time":"2021-01-07 ","value":6},{"time":"2021-01-08","value":8}]}
]
以下是我的预期结果
[
{
"jobid":"E000295",
"car":"BMW",
"toolid":"123",
"oiltype":"a",
"oilcountry":"us",
"reportid":"001",
"areaid":"qaz",
"data":[
{
"time": "2021-01-01",
"value": 1
},
{
"time": "2021-01-02",
"value": 3
},
]
},
{
"jobid":"E000295",
"car":"BMW",
"toolid":"123",
"oiltype":"a",
"oilcountry":"us",
"reportid":"001",
"areaid":"wsx",
"data":[
{
"time": "2021-01-03",
"value": 5
},
{
"time": "2021-01-04",
"value": 7
},
]
},
{
"jobid":"E000295",
"car":"BMW",
"toolid":"123",
"oiltype":"b",
"oilcountry":"china",
"reportid":"002",
"areaid":"edc",
"data":[
{
"time": "2021-01-05",
"value": 2
},
{
"time": "2021-01-06",
"value": 4
},
]
},
{
"jobid":"E000295",
"car":"BMW",
"toolid":"123",
"oiltype":"b",
"oilcountry":"china",
"reportid":"002",
"areaid":"rfv",
"data":[
{
"time": "2021-01-07",
"value": 6
},
{
"time": "2021-01-08",
"value": 8
},
]
}
]
有人知道吗?
【问题讨论】:
-
如果你因为原始文件很大而不得不使用--stream选项,那么我建议你考虑构建一个两阶段的管道:在第一阶段,使用--stream来winnow数据,然后在第二阶段调用不带 --stream 选项的 jq,这样您就可以更轻松地对其进行重构。
-
我有一个想法是先获取“jobid”、“car”、“toolid”,因为在最终的dicts中,它们具有相同的值,然后我可以一起解析其他列。最后,组合起来,但是难点是把不同的层拉到“数据”层,例如:“toolid”拉回“数据”层,{"toolid":XXX, "data":[]}跨度>
-
示例输入和预期输出都不是非常有效的 JSON。请修复它们,并澄清(a)您是否真的需要使用 --stream 选项;确实如此,那么 (b) 您是否可以看到上述两步解决方案的方法。