具有多个数组的 MongoDB 聚合答案

【问题标题】：MongoDB aggregation with multiple arrays具有多个数组的 MongoDB 聚合
【发布时间】：2014-11-06 13:08:17
【问题描述】：

我将以下两项插入到集合“框架”中：

frame1 = {
          "number": 1,

          "hobjects": [ { "htype": 1, "weight": 50 },
                        { "htype": 2, "weight": 220 },
                        { "htype": 2, "weight": 290 },
                        { "htype": 3, "weight": 450 } ],

          "sobjects": [ { "stype": 1, "size": 10.0 },
                        { "stype": 2, "size": 5.1 },
                        { "stype": 2, "size": 6.5 } ],
          }

frame2 = {
          "number": 2,

          "hobjects": [ { "htype": 1, "weight": 61 },
                        { "htype": 2, "weight": 210 },
                        { "htype": 2, "weight": 250 } ],

          "sobjects": [ { "stype": 1, "size": 12.1 },
                        { "stype": 2, "size": 4.9 },
                        { "stype": 2, "size": 6.2 },
                        { "stype": 2, "size": 5.7 } ],
          }

frames.insert(frame1)
frames.insert(frame2)

现在我想查询部分帧数据：

query = { "hobjects.htype": 3, "sobjects.stype": 2 }
db.frames.find(query)

导致：

{u'_id': ObjectId('545b6ea7b9ad9a03462d743b'), u'hobjects': [{u'htype': 1, u'weight': 50}, {u'htype': 2, u'weight': 220}, {u'htype': 2, u'weight': 290}, {u'htype': 3, u'weight': 450}], u'number': 1, u'sobjects': [{u'stype': 1, u'size': 10.0}, {u'stype': 2, u'size': 5.1}, {u'stype': 2, u'size': 6.5}]}

这不是我真正想要的。我想根据查询过滤集合，以便得到以下结果：

{u'_id': ObjectId('545b6ea7b9ad9a03462d743b'), u'hobjects': [{u'htype': 3, u'weight': 450}], u'number': 1, u'sobjects': [{u'stype': 2, u'size': 5.1}, {u'stype': 2, u'size': 6.5}]}

我发现的唯一解决方案是对每个集合进行展开和分组的聚合：

query = { "hobjects.htype": 3, "sobjects.stype": 2 }
db.frames.aggregate([
    { "$match": query },
    { "$unwind": "$hobjects" },
    { "$match": dict((key, value) for key, value in query.iteritems() if "hobjects." in key) },
    { "$group": { "_id": "$_id", "number": { "$first": "$number" } , "hobjects": { "$push": "$hobjects" }, "sobjects": { "$first": "$sobjects" } } },
    { "$unwind": "$sobjects" },
    { "$match": dict((key, value) for key, value in query.iteritems() if "sobjects." in key) },
    { "$group": { "_id": "$_id", "number": { "$first": "$number" } , "hobjects": { "$first": "$hobjects" }, "sobjects": { "$push": "$sobjects" } } },
    ])

我想这不是一种非常有效和灵活的查询方式。不知道有没有其他选择？

【问题讨论】：

标签： mongodb mongodb-query aggregation-framework

【解决方案1】：

如果您的服务器是 MongoDB 2.6 或更高版本，那么您始终可以这样做：

db.frames.aggregate([

    // Still helps to match the documents by conditions to filter
    { "$match": {
        "hobjects.htype": 3, "sobjects.stype": 2 
    }},

    // Now filter inline using $map and $setDifference
    { "$project": {
        "number": 1,
        "hobjects": {
            "$setDifference": [
                { "$map": {
                    "input": "$hobjects",
                    "as": "el",
                    "in": {
                        "$cond": [
                            { "$eq": [ "$$el.htype", 3 ] },
                            "$$el",
                            false
                        ]
                    }
                }},
                [false]
            ]
        },
        "sobjects": {
            "$setDifference": [
                { "$map": {
                    "input": "$sobjects",
                    "as": "el",
                    "in": {
                        "$cond": [
                            { "$eq": [ "$$el.stype", 2 ] },
                            "$$el",
                            false
                        ]
                    }
                }},
                [false]
            ]
        } 
    }}
])

这里要注意的是，基本投影和$elemMatch 之类的东西目前只能匹配数组中匹配条件的 first 元素。因此，要执行更多操作，您需要某种形式的高级操作，该操作仅适用于聚合框架等事物。

$setDiffence 和$map 运算符为您提供了一种处理数组的“内联”方式，实际上是单个文档中的“集合”。这比使用$unwind 更有效，尤其是在涉及大型数组的情况下。

我知道这里的 JavaScript 表示法（主要在 cmets 中），但它与 python 基本相同。

【讨论】：

谢谢@NeilLunn！看起来它是一个最佳选择。会试试看！
测试了解决方案，它适用于我的示例案例。虽然我更喜欢对整个对象数据进行任意查询的更通用的解决方案。
@Neil Lunn - 请您帮忙在 java 驱动程序中编写此查询。谢谢
@PVH 如果您有自己的问题，请Ask A Question。 StackOverflow 不是线程讨论或邮件列表。你提出问题，人们提供答案。

【解决方案2】：

以下聚合可能会对您有所帮助

db.frames.aggregate({"$unwind":"$hobjects"},{"$unwind":"$sobjects"},{"$match":{"hobjects.htype": 3, "sobjects.stype": 2}},{"$group":{"_id":"$_id","u'hobjects":{"$first":"$hobjects"},"u'number":{"$first":"$number"},"u'sobjects":{"$push":"$sobjects"}}})

【讨论】：

展开多个数组的笛卡尔问题。意味着您只是复制了其中之一的元素
@NeilLunn 感谢您的建议，我会努力解决这个问题
继续吧。这个问题之前已经解决过很多次了，我相信我自己也回答过很多次了。但这不是OP所要求的。他们的代码很好，他们只是要求一些更优化的东西，并解释为什么他们正在做的事情是必要的。