如何在mongoDB中聚合答案

【问题标题】：how to aggregate in mongoDB如何在mongoDB中聚合
【发布时间】：2014-05-29 13:00:08
【问题描述】：

我有一个名为 user.monthly 的文档，因为我使用了 store 'day'：没有。点击次数。在这里，我给出了 2 个不同日期的样本

一月份

{
    name : "devid",
    date : ISODate("2014-01-21T11:32:42.392Z"),
    daily: {'1':12,'9':13,'30':13}
}

二月份

{
    name : "devid",
    date : ISODate("2014-02-21T11:32:42.392Z"),
    daily: {'3':12,'12':13,'25':13}
}

如何汇总这些数据并获得 1 月和 2 月的总点击次数？请帮我解决我的问题。

【问题讨论】：

获取应用程序并在那里计算
你对聚合框架做了什么尝试？你的架构不是很清楚，你怎么知道是二月还是一月？
“总点击次数”是什么意思？我想你的意思是你在“每天”那里呈现的价值观。如果您打算使用聚合框架而不是 mapReduce，则该结构对您没有帮助。但是，在您提供的日期之前，没有显示实际上在“二月”或“一月”的单个日期并不能完全帮助这个问题。

标签： mongodb mapreduce aggregation-framework

【解决方案1】：

您当前的架构在这里没有帮助您，因为“每日”字段（我们假设是您的每种类型的点击次数或类似的东西）表示为子文档，这意味着您需要明确命名路径每个字段，以便对其进行处理。

更好的方法是将这些信息放在一个数组中：

{
    "name" : "devid",
    "date" : ISODate("2014-02-21T11:32:42.392Z"),
    "daily": [
        { "type": "3",  "clicks": 12 },
        { "type": "12", "clicks": 13 },
        { "type": "25", "clicks": 13 }
    ]
}

然后你有一个聚合语句是这样的：

db.collection.aggregate([

    // Just match the dates in January and February
    { "$match": {
        "date": {
            "$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
        }
    }},

    // Unwind the "daily" array
    { "$unwind": "$daily" },

    // Group the values together by "type" on "January" and "February"
    { "$group": {
        "_id": {
            "year": { "$year": "$date" },
            "month": { "$month": "$date" },
            "type": "$daily.type"
        },
        "clicks": { "$sum": "$daily.clicks" }
    }},

    // Sort the result nicely
    { "$sort": { 
        "_id.year": 1,
        "_id.month": 1,
        "_id.type": 1
    }}
])

这种形式很简单。或者即使您不关心类型作为分组而只想要月份总计：

db.collection.aggregate([
    { "$match": {
        "date": {
            "$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
        }
    }},
    { "$unwind": "$daily" },
    { "$group": {
        "_id": {
            "year": { "$year": "$date" },
            "month": { "$month": "$date" },
        },
        "clicks": { "$sum": "$daily.clicks" }
    }},
    { "$sort": { "_id.year": 1, "_id.month": 1 }}

])

但是对于当前的子文档表单，您目前拥有的这变得很难看：

db.collection.aggregate([
    { "$match": {
        "date": {
            "$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
        }
    }},
    { "$group": {
        "_id": {
            "year": { "$year": "$date" },
            "month": { "$month": "$date" },
        },
        "clicks": { 
            "$sum": {
                "$add": [
                    { "$ifNull": ["$daily.1", 0] },
                    { "$ifNull": ["$daily.3", 0] },
                    { "$ifNull": ["$daily.9", 0] },
                    { "$ifNull": ["$daily.12", 0] },
                    { "$ifNull": ["$daily.25", 0] },
                    { "$ifNull": ["$daily.30", 0] },
                ]
            }
        }
    }}      
])

这表明除了指定每天下基本上每个可能的字段（因此可能更大）之外，您别无选择。然后我们必须评估，因为给定文档可能不存在该键以返回默认值。

例如，您的第一个文档没有键“daily.3”，因此如果没有 $ifNull 检查，则返回值将是 null 并使整个 $sum 过程无效，因此总数将为“0”。

在第一个聚合示例中对这些键进行分组变得更糟：

db.collection.aggregate([

    // Just match the dates in January and February
    { "$match": {
        "date": {
            "$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
        }
    }},

    // Project with an array to match all possible values
    { "$project": {
        "date": 1,
        "daily": 1,
        "type": { "$literal": ["1", "3", "9", "12", "25", "30" ] }
    }},

    // Unwind the "type" array
    { "$unwind": "$type" },

    // Project values onto the "type" while grouping
    { "$group" : {
         "_id": {
             "year": { "$year": "$date" },
             "month": { "$month": "$date" },
             "type": "$type"
         },
         "clicks": { "$sum": { "$cond": [
                     { "$eq": [ "$type", "1" ] },
                     "$daily.1",
                     { "$cond": [
                         { "$eq": [ "$type", "3" ] },
                         "$daily.3",
                         { "$cond": [
                             { "$eq": [ "$type", "9" ] },
                             "$daily.9",
                             { "$cond": [
                                 { "$eq": [ "$type", "12" ] },
                                 "$daily.12",
                                 { "$cond": [
                                     { "$eq": [ "$type", "25" ] },
                                     "$daily.25",
                                     "$daily.30"
                                 ]}
                             ]}
                         ]}
                     ]}
         ]}}
    }},
    { "$sort": { 
       "_id.year": 1,
       "_id.month": 1,
       "_id.type": 1
    }}
])

使用$cond 创建一个大型条件评估，以将值与我们使用$literal 运算符将所有可能值投影到数组中的“类型”相匹配。

如果您没有 MongoDB 2.6 或更高版本，您始终可以这样做来代替 $literal 运算符语句：

        "type": { "$cond": [1, ["1", "3", "9", "12", "25", "30" ], 0] }

基本上来自$cond 的true 评估返回一个“文字”声明值，这是您指定数组的方式。还有一个隐藏的 $const 运算符没有记录，但现在公开为$literal。

正如您所见，这里的结构对您没有任何好处，所以最好的选择是改变它。但是如果你不能或者发现这个聚合概念太难处理，那么mapReduce 提供了一种方法，但是处理会慢得多：

db.collection.mapReduce(
    function () {
        for ( var k in this.daily ) {
            emit(
                {
                    year: this.date.getFullYear(),
                    month: this.date.getMonth() + 1,
                    type: k
                },
                this.daily[k]
            );
        }
    },
    function(key,values) {
        return Array.sum( values );
    },
    { 
        "query": {
            "date": {
                "$gte": new Date("2014-01-01"), "$lt": new Date("2014-03-01")
            }
        },
        "out": { "inline": 1 } 
    }
)

这里的一般教训是，通过更改文档格式和使用聚合框架，您将获得最清晰和最快的结果。但是这里列出了所有的方法。

【讨论】：

Neil Lunn，您的解决方案确实帮助我解决了我的问题。