Mongo按数组中的匹配数排序答案

【问题标题】：Mongo Sort by Count of Matches in ArrayMongo按数组中的匹配数排序
【发布时间】：2017-05-25 14:04:08
【问题描述】：

假设我的测试数据是

db.multiArr.insert({"ID" : "fruit1","Keys" : ["apple", "orange", "banana"]})
db.multiArr.insert({"ID" : "fruit2","Keys" : ["apple", "carrot", "banana"]})

为了得到像胡萝卜这样的个别水果

db.multiArr.find({'Keys':{$in:['carrot']}})

当我对橙子和香蕉进行 or 查询时，我看到记录fruit1 和fruit2

db.multiArr.find({ $or: [{'Keys':{$in:['carrot']}}, {'Keys':{$in:['banana']}}]})

输出的结果应该是fruit2，然后是fruit1，因为fruit2既有胡萝卜又有香蕉

【问题讨论】：

i see both the records fruit1 and then fruit2 - 然后你说should be fruit2 and then fruit1 - 你得到的正是你想要的？！
你想做$and查询吗？
@Alex 问题还说" ...输出应该是fruit2然后是fruit1，因为fruit2既有胡萝卜也有香蕉"。这就是要求对结果进行“加权排序”的关键，而不仅仅是返回两个文档，因为它们都匹配。
啊，我明白了

标签： mongodb mongodb-query aggregation-framework

【解决方案1】：

要真正首先回答这个问题，您需要“计算”与给定条件匹配的数量，以便对结果进行“排序”，以优先返回最匹配的结果。

为此，您需要聚合框架，这是您在 MongoDB 中用于“计算”和“操作”数据的工具：

db.multiArr.aggregate([
  { "$match": { "Keys": { "$in": [ "carrot", "banana" ] } } },
  { "$project": {
    "ID": 1,
    "Keys": 1,
    "order": {
      "$size": {
        "$setIntersection": [ ["carrot", "banana"], "$Keys" ]
      }
    }
  }},
  { "$sort": { "order": -1 } }
])

在版本 3 之前的 MongoDB 上，您可以使用更长的形式：

db.multiArr.aggregate([
  { "$match": { "Keys": { "$in": [ "carrot", "banana" ] } } },
  { "$unwind": "$Keys" },
  { "$group": {
    "_id": "$_id",
    "ID": { "$first": "$ID" },
    "Keys": { "$push": "$Keys" },
    "order": {
      "$sum": {
        { "$cond": [
          { "$or": [
           { "$eq": [ "$Keys", "carrot" ] },
           { "$eq": [ "$Keys", "banana" ] }
         ]},
         1,
         0
        ]}
      }
    }
  }},
  { "$sort": { "order": -1 } }
])

在任何一种情况下，这里的功能都是首先通过提供带有$in 的参数“列表”来将可能的文档与条件匹配。获得结果后，您希望将数组中匹配元素的数量“计数”到提供的可能值的“列表”中。

在现代形式中，$setIntersection 运算符比较两个“列表”，返回一个仅包含“唯一”匹配成员的新数组。由于我们想知道有多少匹配项，我们只需返回该列表的 $size。

在旧版本中，您使用 $unwind 将文档数组分开，以便对其执行操作，因为旧版本缺少可以在不更改的情况下处理数组的新运算符。然后该过程单独查看每个值，如果$or 中的任一表达式与可能的值匹配，则$cond 三元组将1 的值返回给$sum 累加器，否则返回0。最终结果与现代版本显示的“匹配计数”相同。

最后就是$sort 基于返回的“匹配计数”的结果，因此最多匹配位于“顶部”。这是“降序”，因此您提供 -1 来表明这一点。

关于 $in 和数组的附录

对于初学者来说，您对 MongoDB 查询有一些误解。 $in 运算符实际上是用于这样的参数“列表”：

{ "Keys": { "$in": [ "carrot", "banana" ] } }

这本质上是“在属性'Keys'中匹配'carrot'或'banana'”的简写方式。甚至可以写成这样的长格式：

{ "$or": [{ "Keys": "carrot" }, { "Keys": "banana" }] }

如果它是一个“奇异”匹配条件，那么你只需提供要匹配属性的值：

{ "Keys": "carrot" }

所以这应该涵盖您使用$in 来匹配文档中的数组属性的误解。相反，“反向”情况是预期的用法，您提供一个“参数列表”来匹配给定属性，该属性是一个数组或只是一个值。

MongoDB 查询引擎不区分相等或类似操作中的单个值或值数组。

【讨论】：