一个请求中的多个聚合函数答案

【问题标题】：Multiple Aggregate functions in one request一个请求中的多个聚合函数
【发布时间】：2015-03-21 03:20:44
【问题描述】：

我有一个数据集如下：

{
  item: '123',
    array: [{
      array2:[{
        array3: [{
          property1: 1234
        }]
      }],
      anotherArray: [{
        property2: 1234
      }]
    }]
}

我试图在同一个请求中聚合属性 2 和属性 1 的总和。这是我当前的聚合函数：

Item.aggregate([
            {$match: {itemId: 1234}},
            {$unwind: "$array"},
            {$unwind: "$array.array2"},
            {$unwind: "$array.array2.array3"},
            {$unwind: "$array.anotherArray"},
            {$group: {
                _id: 0,
                property1: {$sum: '$array.array2.array3.property1'},
                property2: {$sum: '$array.anotherArray.property2'}

            }},
            {$project: {
                _id: 0,
                property1: "$property1",
                property2: "$property2",

            }},
        ], function (err, aggregate) {
            callback(null, aggregate);
        });

问题是属性 1 和 2 的聚合结果始终是它们应有的值的两倍。

我猜问题出在“anotherArray”的 $unwind 上，因为当我删除它时，我得到了正确的聚合值。

是否可以用一个聚合函数对多个数组进行聚合？

目前我只是通过异步并行向数据库发出 2 个不同的请求，但我想在未来进行更复杂的聚合而不进行额外的数据库调用。

【问题讨论】：

嵌套数组是一个非常糟糕的想法
我有一个具有预定义结构的非常复杂的数据集，我无法更改太多。所以这里不可能不使用嵌套数组...
如果您打算查询或聚合它们，它们确实是一个非常糟糕的主意，尤其是像这样嵌套的乘法。您能否至少向我们解释一下这些数据意味着什么以及为什么它需要处于这样一个可疑的结构中？总和的意义是什么？我们能否制作一个具有不同结构和部分数据的辅助集合，并用它来计算总和？

标签： node.js mongodb mongoose mongodb-query aggregation-framework

【解决方案1】：

如前所述，该结构不是一个好的结构，可能应该对其意图进行审查。真的不清楚为什么它如此结构化，或者数组中的任何其他内容在这两种情况下是否会弄乱这里的结果。

但是当您在一个文档中有多个数组时，有一种通用方法，即基本上分别处理每个数组并首先获取每个文档的“总数”。然后将所有文档的总数相加：

Item.aggregate([
    // Unwind only 1 inner array first
    { "$unwind": "$array" },
    { "$unwind": "$array.array2" },
    { "$unwind": "$array.array2.array3" },

    // Group back the sum of the element and the first of the other array
    // and only per document
    { "$group": {
        "_id": "$_id",
        "property1": { "$sum": "$array.array2.array3.property1" },
        "anotherArray": { "$first": "$array.anotherArray" }
    }},

    // Unwind the other array
    { "$unwind": "$anotherArray" },

    // Group back the total and the first summed per document
    { "$group": {
        "_id": "$_id",
        "property1": { "$first": "$property1" },
        "property2": { "$sum": "$anotherArray.property2" }
    }},

    // Total all documents and output
    { "$group": {
        "_id": null,
        "property1": { "$sum": "$property1" },
        "property2": { "$sum": "$property2" },
    }},
    { "$project": {
        "_id": 0,
        "property1": 1,
        "property2": 1
    }}
],callback);

因此，通过一次仅包含一个数组并首先仅在原始文档中获取总数，您可以避免为另一个数组的每个未展开项创建多个副本的重复问题。使用离散的文档总计，可以很容易地从您所需的选择中获取总体总计。

【讨论】：

谢谢你的回答，但最后我找到了一个简单的解决方案，使用来自 mongodb 的 $setUnion
@scopsy 我认为这里的要点是您实际上不能一次“展开”两个数组。另一种情况是$setUnion 只能在您的结果是真正的“集合”并且所有值都是唯一的情况下工作。否则将删除重复项。

【解决方案2】：

我终于找到了一个使用 MongoDB $setUnion 的用例的解决方案。

这是我用于我的问题的代码：

Item.aggregate([
            {$match: { itemID: '1234'}},
            {$unwind: "$array1"},
            {$unwind: "$array1.array2"},
            {$project: {
                _id: 0,
                combined: {$setUnion: ['$array1.anotherArray', '$array1.array2.array3']},

            }},
            {$unwind: "$combined"},
            {$group: {
                _id: 0,
                property1: {$sum: '$combined.property1'},
                property2: {$sum: '$combined.property2'}
            }},
        ], function (err, aggregate) {
            cb(aggregate);
        });

【讨论】：