解析嵌套的 json 具有与 jq 流模式相同的属性答案

【问题标题】：parse the nested json have same attribute with jq streaming mode解析嵌套的 json 具有与 jq 流模式相同的属性
【发布时间】：2021-09-07 09:00:20
【问题描述】：

我想从下面嵌套的 json 文件中解析数据，但是 json 中的“keys”太多，很难解析数据

{
  "jobname": {
    "keys": {
        "jobid":"E000295",
        "car":"BMW"
    },
    "property":{
        "doctype":"File",
        "areadesc":[
            {
                "areaid":"qaz",
                "weather":"hot",
            },
            {
                "areaid":"wsx",
                "weather":"code",
            },
            {
                "areaid":"edc",
                "weather":"hot",
            },
            {
                "areaid":"rfv",
                "weather":"hot",
            }
        ]
    },
    "toolJobs":[
        {
            "keys":{
                "toolid":"123"
            },
            "reports":[
                {
                    "keys":{
                        "oiltype":"a",
                        "oilcountry":"us"
                    },
                    "property":{"reportid":"001"},
                    "datas":[
                        {
                            "keys":{"areaid":"qaz"},
                            "data":[
                                {
                                    "time": "2021-01-01",
                                    "value": 1
                                },
                                {
                                    "time": "2021-01-02",
                                    "value": 3
                                },
                            ]
                        },
                        {
                            "keys":{"areaid":"wsx"},
                            "data":[
                                {
                                    "time": "2021-01-03",
                                    "value": 5
                                },
                                {
                                    "time": "2021-01-04",
                                    "value": 7
                                },
                            ]
                        },
                    ]
                },
                {
                    "keys":{
                        "oiltype":"b",
                        "oilcountry":"china"
                    },
                    "property":{"reportid":"002"},
                    "datas":[
                        {
                            "keys":{"areaid":"edc"},
                            "data":[
                                {
                                    "time": "2021-01-05",
                                    "value": 2
                                },
                                {
                                    "time": "2021-01-06",
                                    "value": 4
                                },
                            ]
                        },
                        {
                            "keys":{"areaid":"rfv"},
                            "data":[
                                {
                                    "time": "2021-01-07",
                                    "value": 6
                                },
                                {
                                    "time": "2021-01-08",
                                    "value": 8
                                },
                            ]
                        },
                    ]
                }
            ]
        }
    ]
  }
}

到目前为止，我可以使用下面的代码得到基本结果，但是有些列没有，例如oiltype，oilcountry，reportid，areaid

cat tmp1.json |  jq -cn --stream '
 [fromstream( 
   1|truncate_stream(inputs)
   | (.[0][:2] | index("keys")) as $ix 
   | if $ix then .[0] |= .[1+$ix:] 
     else (.[0] | index("toolJobs")) as $iy | (.[0][$iy:$iy+3] | index("keys")) as $iz
     | if $iz then .[0] |= .[1+$iy+$iz:]
       else (.[0] | index("data")) as $ik
       | if $ik then .[0] |= .[$ik:]
         else empty
         end
       end
     end 
  )] | .[0] as $header | .[1] as $tool | [.[2:][] | ($header+ $tool+.)] | .'

结果是

[ {"jobid":"E000295","car":"BMW","toolid":"123","data":[{"time":"2021-01-01","value":1}, {“时间”：“2021-01-02”，“价值”：3}]}， {"jobid":"E000295","car":"BMW","toolid":"123","data":[{"time":"2021-01-03","value":5}, {“时间”：“2021-01-04”，“价值”：7}]}， {"jobid":"E000295","car":"BMW","toolid":"123","data":[{"time":"2021-01-05","value":2}, {“时间”：“2021-01-06”，“价值”：4}]}， {"jobid":"E000295","car":"BMW","toolid":"123","data":[{"time":"2021-01-07","value":6}, {"time":"2021-01-08","value":8}]}]

我也试试下面的代码

cat tmp1.json |  jq -cn --stream '
 [fromstream( 
   1|truncate_stream(inputs)
   | (.[0][:2] | index("keys")) as $ix 
   | if $ix then .[0] |= .[1+$ix:] 
     else (.[0] | index("toolJobs")) as $iy | (.[0][$iy:$iy+3] | index("keys")) as $iz
     | if $iz then .[0] |= .[1+$iy+$iz:]
       else (.[0] | index("data")) as $ik
       | if $ik then .[0] |= .[$ik:]
         else (.[0] | index("reports")) as $iw | (.[0][$iw:$iw+3] | index("property")) as $ii
         | if $ii then (.[0] |= .[$iw+$ii:])
           else (.[0] | index("keys")) as $ij
           | if $ij then (.[0] |= .[$ij:])
             else empty
             end
           end
         end
       end
     end 
  )] | .[0] as $header | .[1] as $prjob | [.[2:][] | ($header + $prjob + .)] | .'

但结果很奇怪

[

{"jobid":"E000295","car":"BMW","property":{"reportid":"001"},"toolid":"123","keys":{"oiltype" :"a","oilcountry":"us","areaid":"qaz"},"data":[{"time":"2021-01-01","value":1},{"time ":"2021-01-02","值":3}]},

{"jobid":"E000295","car":"BMW","property":{"doctype":"File","areadesc":[{"areaid":"qaz","weather" :"hot"},{"areaid":"wsx","weather":"code"},{"areaid":"edc","weather":"hot"},{"areaid":"rfv" ,"天气":"hot"}]},"toolid":"123","keys":{"areaid":"wsx"},"data":[{"time":"2021-01-03 ","value":5},{"time":"2021-01-04","value":7}]},

{"jobid":"E000295","car":"BMW","property":{"reportid":"002"},"toolid":"123","keys":{"oiltype" :"b","oilcountry":"china","areaid":"edc"},"data":[{"time":"2021-01-05","value":2},{"time ":"2021-01-06","value":4}]},

{"jobid":"E000295","car":"BMW","property":{"doctype":"File","areadesc":[{"areaid":"qaz","weather" :"hot"},{"areaid":"wsx","weather":"code"},{"areaid":"edc","weather":"hot"},{"areaid":"rfv" ,"天气":"hot"}]},"toolid":"123","keys":{"areaid":"rfv"},"data":[{"time":"2021-01-07 ","value":6},{"time":"2021-01-08","value":8}]}

]

以下是我的预期结果

[
    {
        "jobid":"E000295",
        "car":"BMW",
        "toolid":"123",
        "oiltype":"a",
        "oilcountry":"us",
        "reportid":"001",
        "areaid":"qaz",
        "data":[
            {
                "time": "2021-01-01",
                "value": 1
            },
            {
                "time": "2021-01-02",
                "value": 3
            },
        ]
    },
    {
        "jobid":"E000295",
        "car":"BMW",
        "toolid":"123",
        "oiltype":"a",
        "oilcountry":"us",
        "reportid":"001",
        "areaid":"wsx",
        "data":[
            {
                "time": "2021-01-03",
                "value": 5
            },
            {
                "time": "2021-01-04",
                "value": 7
            },
        ]
    },
    {
        "jobid":"E000295",
        "car":"BMW",
        "toolid":"123",
        "oiltype":"b",
        "oilcountry":"china",
        "reportid":"002",
        "areaid":"edc",
        "data":[
            {
                "time": "2021-01-05",
                "value": 2
            },
            {
                "time": "2021-01-06",
                "value": 4
            },
        ]
    },
    {
        "jobid":"E000295",
        "car":"BMW",
        "toolid":"123",
        "oiltype":"b",
        "oilcountry":"china",
        "reportid":"002",
        "areaid":"rfv",
        "data":[
            {
                "time": "2021-01-07",
                "value": 6
            },
            {
                "time": "2021-01-08",
                "value": 8
            },
        ]
    }
]

有人知道吗？

【问题讨论】：

如果你因为原始文件很大而不得不使用--stream选项，那么我建议你考虑构建一个两阶段的管道：在第一阶段，使用--stream来winnow数据，然后在第二阶段调用不带 --stream 选项的 jq，这样您就可以更轻松地对其进行重构。
我有一个想法是先获取“jobid”、“car”、“toolid”，因为在最终的dicts中，它们具有相同的值，然后我可以一起解析其他列。最后，组合起来，但是难点是把不同的层拉到“数据”层，例如：“toolid”拉回“数据”层，{"toolid":XXX, "data":[]}跨度>
示例输入和预期输出都不是非常有效的 JSON。请修复它们，并澄清（a）您是否真的需要使用 --stream 选项；确实如此，那么 (b) 您是否可以看到上述两步解决方案的方法。

标签： json streaming jq

【解决方案1】：

假设输入已被纠正，以下“常规” jq 程序会产生所需的结果：

[
 .jobname
 | (.keys + .toolJobs[].keys) as $one
 | .toolJobs[]
 | .keys as $two
 | .reports[]
 | (.keys + .property) as $three
 | .datas[]
 | (.keys + {data}) as $four
 | $one + $two + $three + $four
]

如果您的输入太大，您可以通过创建一个 jq-to-jq 管道来减少内存需求，第一次调用使用上述程序（或它的 --stream 版本）但删除了外括号.

【讨论】：