【问题标题】:Aggregation Multiple arrays聚合多个数组
【发布时间】:2016-06-08 13:43:04
【问题描述】:

嘿,我无法正确进行聚合。

我有这个数据集,并且在集合中还有几百万个其他类似的文档:

{
  "_id": ObjectId("5757c73344ce54ae1d8b456c"),
  "hostname": "Baklap4",
  "timestamp": NumberLong(1465370500),
  "networkList": [
    {
      "name": "46.243.152.13",
      "openConnections": NumberLong(3)
    },
    {
      "name": "46.243.152.50",
      "openConnections": NumberLong(4)
    }
  ],
  "webserver": "nginx",
  "deviceList": [
    {
      "deviceName": "eth0",
      "receive": NumberLong(183263),
      "transmit": NumberLong(781595)
    },
    {
      "deviceName": "wlan0",
      "receive": NumberLong(0),
      "transmit": NumberLong(0)
    }
  ]
}

我想要什么:

我想获得一个结果集,我在 300 秒的时间跨度内对每个文档进行平均(每个数值)。

[
                [
                    '$match' => [
                        'timestamp' => ['$gte' => $todayMidnight],
                        'hostname' => $serverName
                    ]
                ],
                [
                    '$unwind' => '$networkList'
                ],
                [
                    '$unwind' => '$deviceList'
                ],
                [
                    '$group' => [
                        '_id' => [
                            'interval' => [
                                '$subtract' => [
                                    '$timestamp',
                                    [
                                        '$mod' => ['$timestamp', 300]
                                    ]
                                ]
                            ],
                            'network' => '$networkList.name',
                            'device' => '$deviceList.name',
                        ],
                        'openConnections' => [
                            '$sum' => '$networkList.openConnections'
                        ],
                        'cpuLoad' => [
                            '$avg' => '$cpuLoad'
                        ],
                        'bytesPerSecond' => [
                            '$avg' => '$bytesPerSecond'
                        ],
                        'requestsPerSecond' => [
                            '$avg' => '$requestsPerSecond'
                        ],
                        'webserver' => [
                            '$last' => '$webserver'
                        ],
                        'timestamp' => [
                            '$max' => '$timestamp'
                        ]
                    ]
                ],
                [
                    '$project' => [
                        '_id' => 0,
                        'timestamp' => 1,
                        'cpuLoad' => 1,
                        'bytesPerSecond' => 1,
                        'requestsPerSecond' => 1,
                        'webserver' => 1,
                        'openConnections' => 1,
                        'networkList' => '$networkList',
                        'deviceList' => '$_id.device',
                    ]
                ],
                [
                    '$sort' => [
                        'timestamp' => -1
                    ]
                ]
            ];

但这并没有给我一个包含所有设备和每个设备平均接收和传输字节数的列表。

如何获得这些?

【问题讨论】:

  • 您是从数据库中提取数据吗?如果是,您可以提交数据的整个结构吗? - 最好使用查询在数据库中进行聚合
  • 10 分钟间隔对你有用吗?
  • @Miro 是的,我正在从 MongoDB 中提取数据。数据结构就像第一个数据集。
  • @profesor79 这也可以,我只需将 300 秒(5 分钟)更改为 600 秒。

标签: mongodb symfony aggregation-framework


【解决方案1】:

根据给定的示例,我能够使用这个 mongo Shel 查询获得结果:

var projectTime = {
    $project : {
        _id : 1,
        hostname : 1,
        timestamp : 1,
        networkList : 1,
        webserver : 1,
        deviceList : 1,
        isoDate : {
            $add : [new Date(0), {
                    $multiply : ["$timestamp", 1000]
                }
            ]
        }
    }
}
var group = {
    $group : {

        "_id" : {
            time : {
                "$add" : [{
                        "$subtract" : [{
                                "$subtract" : ["$isoDate", new Date(0)]
                            }, {
                                "$mod" : [{
                                        "$subtract" : ["$isoDate", new Date(0)]
                                    },
                                    1000 * 60 * 5 // 1000 milsseconds * 60 seconds * 5 minutes
                                ]
                            }
                        ]
                    },
                    new Date(0)
                ]
            },
            "hostname" : "$hostname",
            "deviceList_deviceName" : "$deviceList.deviceName",
            "networkList_name" : "$networkList.name",
        },

        xreceive : {
            $sum : "$deviceList.receive"
        },
        xtransmit : {
            $sum : "$deviceList.transmit"
        },
        xopenConnections : {
            $avg : "$networkList.openConnections"
        },

    }
}

var unwindNetworkList = {
    $unwind : "$networkList"
}
var unwindSeviceList = {
    $unwind : "$deviceList"
}

var match = {
    $match : {
        "_id.time" : ISODate("2016-06-09T08:05:00.000Z")
    }
}

var finalProject = {
    $project : {
        _id : 0,
        timestamp : "$_id.time",
        hostname : "$_id.hostname",
        deviceList_deviceName : "$_id.deviceList_deviceName",
        networkList_name : "$_id.networkList_name",
        xreceive : 1,
        xtransmit : 1,
        xopenConnections : 1
    }
}
db.baklap.aggregate([projectTime, unwindNetworkList,
        unwindSeviceList,

        group,
        match,
        finalProject
    ])
db.baklap.findOne()

然后输出:

{
    "xreceive" : NumberLong(0),
    "xtransmit" : NumberLong(0),
    "xopenConnections" : 4.0,
    "timestamp" : ISODate("2016-06-09T08:05:00.000Z"),
    "hostname" : "Baklap4",
    "deviceList_deviceName" : "wlan0",
    "networkList_name" : "46.243.152.50"
}
{
    "xreceive" : NumberLong(183263),
    "xtransmit" : NumberLong(781595),
    "xopenConnections" : 4.0,
    "timestamp" : ISODate("2016-06-09T08:05:00.000Z"),
    "hostname" : "Baklap4",
    "deviceList_deviceName" : "eth0",
    "networkList_name" : "46.243.152.50"
}
{
    "xreceive" : NumberLong(183263),
    "xtransmit" : NumberLong(781595),
    "xopenConnections" : 3.0,
    "timestamp" : ISODate("2016-06-09T08:05:00.000Z"),
    "hostname" : "Baklap4",
    "deviceList_deviceName" : "eth0",
    "networkList_name" : "46.243.152.13"
}
{
    "xreceive" : NumberLong(0),
    "xtransmit" : NumberLong(0),
    "xopenConnections" : 3.0,
    "timestamp" : ISODate("2016-06-09T08:05:00.000Z"),
    "hostname" : "Baklap4",
    "deviceList_deviceName" : "wlan0",
    "networkList_name" : "46.243.152.13"
}

要注意的是,每次处理$unwind 时,我们的数据都会受到一点污染。这可能会在汇总数据时产生副作用(平均值将与 (2+2+3+3)/4 与 (2+3)/2) 相同)

要检查 - 您可以在小组阶段添加 x:{$push:"$$ROOT"} 并在管道执行后检查值 - 因为您将拥有给定数据周期的所有源文档

【讨论】:

  • 在这两种情况下,openConnections 都应该相加,并且是接收和发送的平均值。因此,如果我是正确的,那会将输出中的 xOpenConnections 更改为 7。除了做得好!无论如何要将时间戳的格式更改为31 - 12 23:59 格式?
  • 你能告诉更多关于时间格式的信息吗?没听懂
  • 我认为很清楚的格式将是 Days - MonthNumber hours:minutes 与所有双数字,所以 1 将变为 0111 将保持 11
  • @Baklap4 使用this $dateToString :-)
猜你喜欢
  • 1970-01-01
  • 2018-08-17
  • 2016-10-23
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2014-12-01
  • 1970-01-01
  • 2023-01-04
相关资源
最近更新 更多