如何在与 MongoDB 中的特定过滤器匹配的数组元素上创建索引？答案

【问题标题】：How to create an index on array elements that match a specific filter in MongoDB?如何在与 MongoDB 中的特定过滤器匹配的数组元素上创建索引？
【发布时间】：2021-05-30 19:16:44
【问题描述】：

假设有一个对象集合，每个对象包含一个元素数组，每个元素包含字段attributeName 和attributeValue。如何在attributeValue 上创建索引，但只针对其对应的attributeName 等于特定值的值？

示例集合：

  { "_id": 0, "attributes": 
    [
      {"attributeName": "name", "attributeValue": "John", ...},
      {"attributeName": "age", "attributeValue": "30", ...}
    ]
  },
  { "_id": 1, "attributes": 
    [
      {"attributeName": "name", "attributeValue": "Brian", ...},
      {"attributeName": "gender", "attributeValue": "male", ...}
    ]
  },
  { "_id": 2, "attributes": 
    [
      {"attributeName": "name", "attributeValue": "Kevin", ...},
      {"attributeName": "age", "attributeValue": "35", ...}
    ]
  }

对于给定的示例，我们如何为 "attributeName" == "age" 的值（在本例中为值 30 和 35）创建索引？

【问题讨论】：

见partial-index。

标签： mongodb indexing mongodb-indexes

【解决方案1】：

MongoDB 不支持这种方式的索引。

您可以使用部分索引来确定要索引哪些文档。

对于每个索引文档，数组的所有元素都将包含在索引中。

【讨论】：

【解决方案2】：

2 位用户建议了部分索引。但即使是部分索引也有这个查询的问题。如果我理解您的要求，您希望只索引具有{"name": "age": 30} 或{"name": "age", "age": 35 } 属性元素的文档。您的原始文档将年龄显示为字符串而不是整数，但我相信整数足以进行此讨论。

部分过滤器表达式不允许IN 条件或OR 条件，但它允许AND 条件。此外，我们不能在相同的字段上创建两个几乎相同的索引，Mongo 对此进行了限制。由于这些原因，我们不能在 30 或 35 上创建索引，但可以在 BETWEEN 30 和 35 上创建索引。

db.test.createIndex(
   { "attributes.attributeValue": 1, "attributes.attributeName": 1 },
   {
       partialFilterExpression: 
       {
           $and:
           [
               {"attributes.attributeName": "age"} , {"attributes.attributeValue": {$gte: 30} }, {"attributes.attributeValue": { $lte: 35} }
           ]
       }
   }
)

现在查询这些数据并利用索引完全是另一回事。

我们可以用一种显而易见的方式查询文档...

db.test.find({"attributes.attributeValue": 30, "attributes.attributeName": "age"}).pretty()

...但这可能不会产生我们想要的结果。例如，考虑这个文档...

{ "_id": 3, "attributes": 
    [
      {"attributeName": "name", "attributeValue": "Alisa"},
      {"attributeName": "age", "attributeValue": 17},
      {"attributeName": "favoriteNumber", "attributeValue": 30}
    ]
  }

这个文档将由上面的查询返回，因为作为一个文档，它既有包含“age”的“attributes.attributeName”，又有 30 的“attributes.attributeValue”。虽然数组中有不同的元素，但它仍然是匹配查询定义。我相信我们只想在同一个子文档中找到具有年龄和 30 岁的“属性”文档的文档。为此，我们需要 elemMatch...

db.test.find( { "attributes": { $elemMatch: { "attributeName": "age", "attributeValue": 30 } } } ).pretty()

当我使用此查询时，我收到了预期的结果，但在评估解释计划时，我表明这没有使用我的索引。这正在执行集合扫描...

db.test.find( { "attributes": { $elemMatch: { "attributeName": "age", "attributeValue": 30 } } } ).explain("allPlansExecution")

...那是什么？事实证明，为了使用这个索引，我们需要有两种查询样式。我们需要分别包含每个字段，但也要使用 elemMatch...

db.test.find( { "attributes.attributeName": "age", "attributes.attributeValue": 30, "attributes": { $elemMatch: { "attributeName": "age", "attributeValue": 30 } } } ).pretty()

.. 现在这个查询给出了正确的结果并且它利用了索引....

db.test.find( { "attributes.attributeName": "age", "attributes.attributeValue": 30, "attributes": { $elemMatch: { "attributeName": "age", "attributeValue": 30 } } } ).explain("allPlansExecution")

结论：

不能有针对性的部分过滤表达式，我们能做的最好是一个范围。如果在数组元素上使用部分索引，我们必须单独包含数组元素和 elemMatch 以利用索引。数据类型必须匹配。如果我用“30”（作为字符串）查询，它将找不到数据，也不会使用索引。

旁注：

数组中键值对的索引称为属性模式。有关详细信息，请参阅https://www.mongodb.com/blog/post/building-with-patterns-the-attribute-pattern。复合索引首先使用值字段构建，然后是键字段。这是有意为之，因为值字段可能更具选择性，并允许索引扫描更有效。

【讨论】：