计算 JSON 数组中的平均值答案

【问题标题】：Calculate Average values in a JSON array计算 JSON 数组中的平均值
【发布时间】：2018-01-19 01:15:59
【问题描述】：

我正在使用这种格式的 JSON 文件：

  {
  "Response" : {
    "TimeUnit" : [ 1516298400000, 1516302000000, 1516305600000, 1516309200000, 1516312800000, 1516316400000 ],
    "metaData" : {
      "errors" : [ ],
      "notices" : [ "Source:Postgres", "Limit applied: 14400", "PG Host:ruappg0ro.apigeeks.net", "Metric with Avg of total_response_time was requested. For this a global avg was also computed with name global-avg-total_response_time", "query served by:88bec25a-ef48-464e-b41d-e447e3beeb88", "Table used: edge.api.faxgroupusenondn012.agg_api" ]
    },
    "stats" : {
      "data" : [ {
        "identifier" : {
          "names" : [ "apiproxy" ],
          "values" : [ "test" ]
        },
        "metric" : [ {
          "env" : "test",
          "name" : "sum(message_count)",
          "values" : [ 28.0, 129.0, 24.0, 20.0, 71.0, 30.0 ]
        }, {
          "env" : "test",
          "name" : "avg(total_response_time)",
          "values" : [ 312.57142857142856, 344.2480620155039, 374.2083333333333, 381.1, 350.67605633802816, 363.8 ]
        }, {
          "env" : "test",
          "name" : "sum(is_error)",
          "values" : [ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 ]
        }, {
          "env" : "test",
          "name" : "global-avg-total_response_time",
          "values" : [ 349.5860927152318 ]
        } ]
      }, {
        "identifier" : {
          "names" : [ "apiproxy" ],
          "values" : [ "test2" ]
        },
        "metric" : [ {
          "env" : "test",
          "name" : "sum(message_count)",
          "values" : [ 0.0, 0.0, 0.0, 16.0, 137.0, 100.0 ]
        }, {
          "env" : "test",
          "name" : "avg(total_response_time)",
          "values" : [ 0.0, 0.0, 0.0, 237.4375, 198.02189781021897, 189.44 ]
        }, {
          "env" : "test",
          "name" : "sum(is_error)",
          "values" : [ 0.0, 0.0, 0.0, 16.0, 137.0, 100.0 ]
        }, {
          "env" : "test",
          "name" : "global-avg-total_response_time",
          "values" : [ 197.12252964426878 ]
        } ]
      }, {
        "identifier" : {
          "names" : [ "apiproxy" ],
          "values" : [ "appdyn" ]
        },
        "metric" : [ {
          "env" : "test",
          "name" : "sum(message_count)",
          "values" : [ 0.0, 0.0, 0.0, 11.0, 137.0, 98.0 ]
        }, {
          "env" : "test",
          "name" : "avg(total_response_time)",
          "values" : [ 0.0, 0.0, 0.0, 170.0, 161.57664233576642, 149.16326530612244 ]
        }, {
          "env" : "test",
          "name" : "sum(is_error)",
          "values" : [ 0.0, 0.0, 0.0, 11.0, 137.0, 98.0 ]
        }, {
          "env" : "test",
          "name" : "global-avg-total_response_time",
          "values" : [ 157.0081300813008 ]
        } ]
      }, {
        "identifier" : {
          "names" : [ "apiproxy" ],
          "values" : [ "AppDyn" ]
        },
        "metric" : [ {
          "env" : "test",
          "name" : "sum(message_count)",
          "values" : [ 0.0, 0.0, 0.0, 3.0, 0.0, 0.0 ]
        }, {
          "env" : "test",
          "name" : "avg(total_response_time)",
          "values" : [ 0.0, 0.0, 0.0, 39.333333333333336, 0.0, 0.0 ]
        }, {
          "env" : "test",
          "name" : "sum(is_error)",
          "values" : [ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 ]
        }, {
          "env" : "test",
          "name" : "global-avg-total_response_time",
          "values" : [ 39.333333333333336 ]
        } ]
      } ]
    }
  }
}

并希望计算以下所有值的平均值："name" : "avg(total_response_time)" 对每个 identifier 进行迭代。

我已经尝试了一些尝试，但我真的不知道如何进行，因为 identifiers 和 avg(total_response_time) 的数量各不相同。

for identifier in $(cat response4.json | jq -r  '.[].stats.data[].identifier.values' | sed 's/[][]//g' | sed

's/"//g'); 做回声 ${标识符}

avg_response_time=$(cat response4.json | jq -r  '.[].stats.data[].metric[]') #don't know how to iterate through the 
done

任何帮助/想法将不胜感激。

【问题讨论】：

一般来说，你应该尽量在jq中做更多的逻辑，在shell中做更少的逻辑。没有理由不能让每个标识符/平均值对只有一个 jq 调用返回一个制表符分隔的行。
也就是说，这与minimal reproducible example 相差甚远，因为它目前不包含足够的信息来测试答案的正确性。理想情况下，您应该提供 (1) 输入数据的版本，其中删除了与问题无关的所有内容，以及 (2) 给定该输入的正确/所需输出。
顺便说一句，作为一般说明 - 如果您不需要，请不要使用 cat。 jq . <input.json 或 jq . input.json 都比 cat input.json | jq . 更高效；对于像sort 这样的程序，其中一个优化的实现可以让多个线程同时处理文件的不同子集（但只有当它们被赋予一个可查找的文件时，而 not 在给定一个管道时），区别更大。
例如（为什么这不是像目前写的那样特别可测试）——我没有看到 any 值带有“env”而不是“test” ，所以如果我们还应该为 test2 或 appdyn 或 AppDyn... 生成平均值，确切地会进入每个平均值？
我确实在这里写了一个答案——但与此同时，我为奖励一个远远落后于我们社区标准要求的问题而感到内疚。请请尝试编写符合minimal reproducible example定义的问题（否则不要与How to Answer的“回答好问题”部分下列出的任何类别发生冲突） ) 未来。

标签： arrays json shell average jq

【解决方案1】：

首先，为了清楚起见，这里是一个面向流的辅助函数：

def average(s): 
  reduce s as $x (null; .sum += $x | .n += 1)
  | if . == null then null else .sum / .n end;

接下来，我们有一个选择。我们可以单独处理 .stats.data 数组中的每个项目，也可以按 .identifier 的值对项目进行分组。在示例中，结果将是相同的（可能除了排序），但让我们在这里分别考虑这两种情况：

.stats.data 中每个项目的平均值

.Response.stats.data[]
| {id: (.identifier.values),
   average: average(.metric[]
     | select(.name == "avg(total_response_time)")
     | .values[]) }

按 .identifier 分组

.Response.stats.data
| group_by(.identifier)[]
| {id: (.[0].identifier.values),
   average: (.[].metric[] 
     | select(.name == "avg(total_response_time)") 
     | .values[] ) }

输出

{"id":["test"],"average":354.43398004304896}
{"id":["test2"],"average":104.14989963503649}
{"id":["appdyn"],"average":80.1233179403148}
{"id":["AppDyn"],"average":6.555555555555556}

【讨论】：

【解决方案2】：

jq -r '
    .[].stats.data[]
    | (.identifier.values[]) as $identifier
    | (.metric[]
       | select(.name == "avg(total_response_time)")
       | .values
      ) as $values
    | [$identifier, ($values | add) / ($values | length)]
    | @tsv
    ' <test.json

...产量：

test    354.43398004304896
test2   104.14989963503649
appdyn  80.1233179403148
AppDyn  6.555555555555556

【讨论】：