【问题标题】:Display items grouped by a field显示按字段分组的项目
【发布时间】:2014-11-01 12:28:42
【问题描述】:

我有这个示例项目集合:

{
  "_id": "1",
  "field1": "value1",
  "field2": "value2",
  "category": "phones",
  "user": "1",
  "tags": [
    "tag1",
    "tag3"
  ]
},
{
  "_id": "2",
  "field1": "value1",
  "field2": "value2",
  "category": "phones",
  "user": "1",
  "tags": [
    "tag2",
    "tag3"
  ]
},
{
  "_id": "3",
  "field1": "value1",
  "field2": "value2",
  "category": "bikes",
  "user": "1",
  "tags": [
    "tag3",
    "tag4"
  ]
},
{
  "_id": "4",
  "field1": "value1",
  "field2": "value2",
  "category": "cars",
  "user": "2",
  "tags": [
    "tag1",
    "tag2"
  ]
}

我会搜索特定用户(即用户:1)创建的项目并按类别字段显示它们。结果:

{
  "phones": [
      {
        "_id": "1",
        "field1": "value1",
        "field2": "value2",
        "tags": [
          "tag1",
          "tag3"
         ]
      },
      {
        "_id": "2",
        "field1": "value1",
        "field2": "value2",
        "tags": [
          "tag2",
          "tag3"
         ]
      }
  ],
  "bikes" : [
      {
        "_id": "3",
        "field1": "value1",
        "field2": "value2",
        "tags": [
          "tag3",
          "tag4"
         ]
      }
  ]

}

是否可以通过聚合组功能获得此方案? 谢谢你

【问题讨论】:

    标签: javascript mongodb mapreduce mongodb-query aggregation-framework


    【解决方案1】:

    可以按类别分组,但不能按您呈现的方式。这确实是一件好事,因为您的“类别”实际上是数据,您真的不应该在存储或输出中将“数据”表示为“键”。

    所以真的建议这样改造:

    db.collection.aggregate([
        { "$match": { "user": 1 } },
        { "$group": {
            "_id": "$category",
            "items": { 
                "$push": {
                    "field1": "$field1",
                    "field2": "$field2",
                    "tags": "$tags"
                }
            }
        }},
        { "$group": {
            "_id": null,
            "categories": { 
                "$push": {
                    "_id": "$_id",
                    "items": "$items"
                }
            }
        }}
    ])
    

    你会得到这样的输出:

    {
        "_id" : null,
        "categories" : [
            {
                "_id" : "bikes",
                "items" : [
                    {
                        "_id": 3,
                        "field1" : "value1",
                        "field2" : "value2",
                        "tags" : [
                            "tag3",
                            "tag4"
                        ]
                    }
                ]
            },
            {
                "_id" : "phones",
                "items" : [
                    {
                        "_id": 1,
                        "field1" : "value1",
                        "field2" : "value2",
                        "tags" : [
                            "tag1",
                            "tag3"
                        ]
                    },
                    {
                        "_id": 2,
                        "field1" : "value1",
                        "field2" : "value2",
                        "tags" : [
                            "tag2",
                            "tag3"
                        ]
                    }
                ]
            }
        ]
    }
    

    拥有不随数据变化而改变的通用键名确实更好。这其实就是面向对象的模式。

    如果您真的认为您需要“数据作为键”,那么对于聚合框架,您要么知道您期望的“类别”,要么准备生成管道阶段:

    db.utest.aggregate([
        { "$match": { "user": "1" } },
        { "$group": {
            "_id": null,
            "phones": {
                "$push": {
                    "$cond": [
                        { "$eq": ["$category","phones"] },
                        {
                            "_id": "$_id",
                            "field1": "$field1",
                            "field2": "$field2",
                            "tags": "$tags"
                        },
                        false
                    ]
                }
            },
            "bikes": {
                "$push": {
                    "$cond": [
                        { "$eq": ["$category","bikes"] },
                        {
                            "_id": "$_id",
                            "field1": "$field1",
                            "field2": "$field2",
                            "tags": "$tags"
                        },
                        false
                    ]
                }
            }           
        }},
        { "$unwind": "$phones" },
        { "$match": { "phones": { "$ne": false } }},
        { "$group": {
            "_id": "$_id",
            "phones": { "$push": "$phones" },
            "bikes": { "$first": "$bikes" }
        }},
        { "$unwind": "$bikes" },
        { "$match": { "bikes": { "$ne": false } }},
        { "$group": {
            "_id": "$_id",
            "phones": { "$first": "$phones" },
            "bikes": { "$push": "$bikes" }
        }},
        { "$project": {
            "_id": 0,
            "phones": 1,
            "bikes": 1
        }}
    ])
    

    您可以使用 MongoDB 2.6 将其缩短一点,因为您可以使用 $setDifference 运算符过滤掉 false 值:

    db.collection.aggregate([
        { "$match": { "user": "1" } },
        { "$group": {
            "_id": null,
            "phones": {
                "$push": {
                    "$cond": [
                        { "$eq": ["$category","phones"] },
                        {
                            "_id": "$_id",
                            "field1": "$field1",
                            "field2": "$field2",
                            "tags": "$tags"
                        },
                        false
                    ]
                }
            },
            "bikes": {
                "$push": {
                    "$cond": [
                        { "$eq": ["$category","bikes"] },
                        {
                            "_id": "$_id",
                            "field1": "$field1",
                            "field2": "$field2",
                            "tags": "$tags"
                        },
                        false
                    ]
                }
            }           
        }},
        { "$project": {
            "_id": 0,
            "phones": { "$setDifference": ["$phones",[false]] },
            "bikes": { "$setDifference": ["$bikes",[false]] }
        }}
    ])
    

    两者都按照您的要求产生输出:

    {
        "phones" : [
            {
                "_id" : "1",
                "field1" : "value1",
                "field2" : "value2",
                "tags" : [
                    "tag1",
                    "tag3"
                ]
            },
            {
                "_id" : "2",
                "field1" : "value1",
                "field2" : "value2",
                "tags" : [
                    "tag2",
                    "tag3"
                ]
            }
        ],
        "bikes" : [
            {
                "_id" : "3",
                "field1" : "value1",
                "field2" : "value2",
                "tags" : [
                    "tag3",
                    "tag4"
                ]
            }
        ]
    }
    

    这里的一般情况是聚合框架不会允许将字段数据用作键,因此您需要对数据进行分组或自己指定键名。

    获得“动态”键名的唯一方法是改用mapReduce

    db.collection.mapReduce(
        function () {
          var obj = { };
          var category = this.category;
          delete this.user;
          delete this.category;
    
          obj[category] = [this];
    
          emit(null,obj);
        },
        function (key,values) {
    
          var reduced = {};
    
          values.forEach(function(value) {
            Object.keys(value).forEach(function(key) {
              if ( !reduced.hasOwnProperty(key) )
                reduced[key] = [];
              value[key].forEach(function(item) {
                reduced[key].push(item);
              });
            });
          });
    
          return reduced;
    
        },
        {
            "query": { "user": "1" },
            "out": { "inline": 1 }
        }
    )
    

    所以现在密钥生成是动态的,但是输出是通过非常 mapReduce 的方式完成的:

    {
        "_id" : null,
        "value" : {
            "phones" : [
                {
                    "_id" : "1",
                    "field1" : "value1",
                    "field2" : "value2",
                    "tags" : [
                        "tag1",
                        "tag3"
                    ]
                },
                {
                    "_id" : "2",
                    "field1" : "value1",
                    "field2" : "value2",
                    "tags" : [
                        "tag2",
                        "tag3"
                    ]
                }
            ],
            "bikes" : [
                {
                    "_id" : "3",
                    "field1" : "value1",
                    "field2" : "value2",
                    "tags" : [
                        "tag3",
                        "tag4"
                    ]
                }
            ]
        }
    }
    

    因此输出受 mapReduce 如何引导输出的限制,并且此处评估 JavaScript 将比聚合框架的本机操作慢。操纵权力更大,但这是一种权衡。

    总结一下,如果你坚持这种模式,那么聚合框架的第一种方法是最快和最好的方法,而且一旦从服务器返回,你总是可以重新构造结果。如果您坚持打破模式并需要来自服务器的动态键,那么 mapReduce 将在其他聚合框架被认为不切实际的情况下执行此操作。

    【讨论】:

    • 我真的很感谢你的 esplications。我认为第一个是最好的解决方案,因为它尊重模式,使用本机聚合函数最后代码对于像我这样的新手来说更简单。你能给我推荐一个更简单的指南来学习使用或聚合函数吗?
    • @Hadokee 这可能是我可以考虑考虑最近编写一些技术文章的方法。您可以随时查看此处标记为aggregation-framework 的问题,或者通常在文档中查看带有SQL to Aggregation mapping 等指南的指南,如果您通常熟悉SQL,则该指南涵盖了一些常见情况。
    猜你喜欢
    • 2014-10-27
    • 2012-09-25
    • 1970-01-01
    • 2021-02-08
    • 1970-01-01
    • 2020-12-14
    • 1970-01-01
    • 2019-10-11
    • 2020-06-20
    相关资源
    最近更新 更多