Mongodb中匹配子文档的批量更新数组答案

【问题标题】：Bulk update array of matching sub document in MongodbMongodb中匹配子文档的批量更新数组
【发布时间】：2018-04-17 12:45:35
【问题描述】：

我在 Mongodb 3.6 上运行。以下是我的文档的结构，其中存储了产品列表的月费率信息：

{
  "_id": 12345,
  "_class": "com.example.ProductRates",
  "rates": [
    {
      "productId": NumberInt(1234),
      "rate": 100.0,
      "rateCardId": NumberInt(1),
      "month": NumberInt(201801)
    },
    {
      "productId": NumberInt(1234),
      "rate": 200.0,
      "rateCardId": NumberInt(1),
      "month": NumberInt(201802)
    },
    {
      "productId": NumberInt(1234),
      "rate": 400.0,
      "rateCardId": NumberInt(2),
      "month": NumberInt(201803)
    },
    {
      "productId": NumberInt(1235),
      "rate": 500.0,
      "rateCardId": NumberInt(1),
      "month": NumberInt(201801)
    },
    {
      "productId": NumberInt(1235),
      "rate": 234,
      "rateCardId": NumberInt(2),
      "month": NumberInt(201803)
    }
  ]
}

对关联的价目表的任何更改都会将更新传播到“rates”数组中的多个子文档。

以下是需要对上述文档应用的更改

{
    "productId" : NumberInt(1234), 
    "rate" : 400.0, 
    "rateCardId": NumberInt(1),
    "month" : NumberInt(201801)
}, 
{
    "productId" : NumberInt(1234), 
    "rate" : 500.0, 
    "rateCardId": NumberInt(1),
    "month" : NumberInt(201802)
}, 
{
    "productId" : NumberInt(1235), 
    "rate" : 700.0, 
    "rateCardId": NumberInt(1),
    "month" : NumberInt(201802)
}

有没有办法在不将整个文档加载到内存的情况下增量更新数组“rates”下的子文档，以便合并更改？假设我的子文档标识符是rates.[].productId、rates.[].month 和rates.[].rateCardId 的组合。

我可以在 3.6 中使用 $[<identifier>] 一次更新多个文档，但值相同。

db.avail.rates_copy.update(
  { "_id" : 12345 },
  { $set: { "rates.$[item].rate": 0  } },
  { multi: true, 
   arrayFilters: [ { "item.rateCardId": {$in: [ 1, 2]} } ]
  }
)

而在我的例子中，值会根据上述标识符组合在文档之间发生变化，这些标识符组合来自不同的系统。

有没有办法说，用新值更新所有与变更集中（productId、month 和 rateCardId）匹配的子文档。

【问题讨论】：

好问题，很明显你已经做了一些研究，虽然你没有找到答案。所以真的太棒了！这里的第一个问题。一个小小的建议是确保将来您不会“混淆”ObjectId 之类的值。如果您的问题范围需要它，请向我们展示真正的价值或用其他适当“有效”的东西替换它。 ObjectId("12345") 无效，而普通的 12345 完全有效。无论如何，共享ObjectId 值并不是说存在任何安全风险。
作为另一件小事。不要使用不适合问题的语言进行标记。如果您使用 Java 或 Spring 进行标记，那么 "whole" 问题确实需要在 Java 代码中构造。实际上，这个问题归结为用于构造语句的“BSON 语法”，因此坚持您实际包含的 shell 的 JavaScript 语法基本上就足够了。但除此之外，第一个问题很好。
谢谢。将在以后的帖子中确保这些事情。

标签： mongodb mongodb-query

【解决方案1】：

简而言之，“是”和“否”。

确实有一种方法可以匹配单个数组元素并在单个语句中使用单独的值更新它们，因为实际上您可以提供“多个”arrayFilters 条件并在更新语句中使用这些标识符。

您的特定示例的问题在于您的“更改集”中的一个条目（最后一个）实际上与当前存在的任何数组成员都不匹配。这里的“假定”操作是将$push 那个新的不匹配成员放入未找到它的数组中。但是，该特定操作不能在“单个操作”中完成，但您可以使用bulkWrite() 发出“多个”语句来涵盖这种情况。

匹配不同的数组条件

解释一下，考虑一下“变更集”中的前两项。您可以像这样应用具有多个 arrayFilters 的 "single" 更新语句：

db.avail_rates_copy.updateOne(
  { "_id": 12345 },
  { 
    "$set": {
      "rates.$[one]": {
        "productId" : NumberInt(1234), 
        "rate" : 400.0, 
        "rateCardId": NumberInt(1),
        "month" : NumberInt(201801)
      },
      "rates.$[two]": {
        "productId" : NumberInt(1234), 
        "rate" : 500.0, 
        "rateCardId": NumberInt(1),
        "month" : NumberInt(201802)
      } 
    }
  },
  { 
    "arrayFilters": [
      {
        "one.productId": NumberInt(1234),
        "one.rateCardId": NumberInt(1),
        "one.month": NumberInt(201801)
      },
      {
        "two.productId": NumberInt(1234),
        "two.rateCardId": NumberInt(1),
        "two.month": NumberInt(201802)
      }
    ]
  }
)

如果你运行，你会看到修改后的文档变成：

{
        "_id" : 12345,
        "_class" : "com.example.ProductRates",
        "rates" : [
                {                             // Matched and changed this by one
                        "productId" : 1234,
                        "rate" : 400,
                        "rateCardId" : 1,
                        "month" : 201801
                },
                {                            // And this as two
                        "productId" : 1234,
                        "rate" : 500,
                        "rateCardId" : 1,
                        "month" : 201802
                },
                {
                        "productId" : 1234,
                        "rate" : 400,
                        "rateCardId" : 2,
                        "month" : 201803
                },
                {
                        "productId" : 1235,
                        "rate" : 500,
                        "rateCardId" : 1,
                        "month" : 201801
                },
                {
                        "productId" : 1235,
                        "rate" : 234,
                        "rateCardId" : 2,
                        "month" : 201803
                }
        ]
}

请注意，您在arrayFilters 列表中指定每个“标识符”，并使用多个条件来匹配元素，如下所示：

  {
    "one.productId": NumberInt(1234),
    "one.rateCardId": NumberInt(1),
    "one.month": NumberInt(201801)
  },

所以每个“条件”有效地映射为：

  <identifier>.<property>

所以它知道通过$[<indentifier>] 的更新块中的语句查看"rates" 数组：

 "rates.$[one]"

并查看"rates" 的每个元素以匹配条件。因此，"one" 标识符将匹配以 "one" 为前缀的条件，同样适用于以"two" 为前缀的另一组条件，因此实际的更新语句仅适用于与分配给标识符的条件匹配的条件。

如果您只想要 "rates" 属性而不是整个对象，那么您只需将其标记为：

{ "$set": { "rates.$[one].rate": 400, "rates.$[two].rate": 500 } }

添加不匹配的对象

所以第一部分比较容易理解，但是如前所述，为“不存在的元素”执行$push 是另一回事，因为我们基本上需要“文档”级别的查询条件才能按顺序确定数组元素“丢失”。

这实质上意味着您需要使用$push 发出更新，以查找每个数组元素以查看它是否存在。当它不存在时，则文档是匹配的，并执行$push。

这就是bulkWrite() 发挥作用的地方，您可以通过在上面的第一个操作中为“更改集”中的每个元素添加额外的更新来使用它：

db.avail_rates_copy.bulkWrite(
  [
    { "updateOne": {
      "filter": { "_id": 12345 },
      "update": {
        "$set": {
          "rates.$[one]": {
            "productId" : NumberInt(1234), 
            "rate" : 400.0, 
            "rateCardId": NumberInt(1),
            "month" : NumberInt(201801)
          },
          "rates.$[two]": {
            "productId" : NumberInt(1234), 
            "rate" : 500.0, 
            "rateCardId": NumberInt(1),
            "month" : NumberInt(201802)
          },
          "rates.$[three]": {
            "productId" : NumberInt(1235), 
            "rate" : 700.0, 
            "rateCardId": NumberInt(1),
            "month" : NumberInt(201802)
          }
        }
      },
      "arrayFilters": [
        {
          "one.productId": NumberInt(1234),
          "one.rateCardId": NumberInt(1),
          "one.month": NumberInt(201801)
        },
        {
          "two.productId": NumberInt(1234),
          "two.rateCardId": NumberInt(1),
          "two.month": NumberInt(201802)
        },
        {
          "three.productId": NumberInt(1235),
          "three.rateCardId": NumberInt(1),
          "three.month": NumberInt(201802)
        }
      ]    
    }},
    { "updateOne": {
      "filter": {
        "_id": 12345,
        "rates": {
          "$not": {
            "$elemMatch": {
              "productId" : NumberInt(1234), 
              "rateCardId": NumberInt(1),
              "month" : NumberInt(201801)
            }
          }
        }
      },
      "update": {
        "$push": {
          "rates": {
            "productId" : NumberInt(1234), 
            "rate" : 400.0, 
            "rateCardId": NumberInt(1),
            "month" : NumberInt(201801)
          }
        }
      }
    }},
    { "updateOne": {
      "filter": {
        "_id": 12345,
        "rates": {
          "$not": {
            "$elemMatch": {
              "productId" : NumberInt(1234), 
              "rateCardId": NumberInt(1),
              "month" : NumberInt(201802)
            }
          }
        }
      },
      "update": {
        "$push": {
          "rates": {
            "productId" : NumberInt(1234), 
            "rate" : 500.0, 
            "rateCardId": NumberInt(1),
            "month" : NumberInt(201802)
          }
        }
      }
    }},
    { "updateOne": {
      "filter": {
        "_id": 12345,
        "rates": {
          "$not": {
            "$elemMatch": {
              "productId" : NumberInt(1235),
              "rateCardId": NumberInt(1),
              "month" : NumberInt(201802)
            }
          }
        }
      },
      "update": {
        "$push": {
          "rates": {
            "productId" : NumberInt(1235),
            "rate" : 700.0, 
            "rateCardId": NumberInt(1),
            "month" : NumberInt(201802)
          }
        }
      }
    }}
  ],
  { "ordered": true }
)

注意$elemMatch 带有查询过滤器，因为这是通过“多个条件”匹配数组元素的要求。我们不需要在 arrayFilters 条目上使用它，因为它们仅查看它们已应用于的每个数组项，但作为“查询”，条件要求 $elemMatch 为简单的“点符号” " 将返回不正确的匹配项。

另请参阅此处使用$not 运算符来“否定”$elemMatch，因为我们的真实条件是仅将 “未匹配数组元素” 的文档与提供的文档匹配条件，这就是选择添加新元素的理由。

发给服务器的那条语句实际上尝试了四个更新操作，其中一个尝试更新匹配的数组元素，另一个用于每个三个“更改集" 试图$push 发现文档与“更改集”中的数组元素的条件不匹配。

因此结果符合预期：

{
        "_id" : 12345,
        "_class" : "com.example.ProductRates",
        "rates" : [
                {                               // matched and updated
                        "productId" : 1234,
                        "rate" : 400,
                        "rateCardId" : 1,
                        "month" : 201801
                },
                {                               // matched and updated
                        "productId" : 1234,
                        "rate" : 500,
                        "rateCardId" : 1,
                        "month" : 201802
                },
                {
                        "productId" : 1234,
                        "rate" : 400,
                        "rateCardId" : 2,
                        "month" : 201803
                },
                {
                        "productId" : 1235,
                        "rate" : 500,
                        "rateCardId" : 1,
                        "month" : 201801
                },
                {
                        "productId" : 1235,
                        "rate" : 234,
                        "rateCardId" : 2,
                        "month" : 201803
                },
                {                              // This was appended
                        "productId" : 1235,
                        "rate" : 700,
                        "rateCardId" : 1,
                        "month" : 201802
                }
        ]
}

根据实际不匹配的元素数量，bulkWrite() 响应将报告这些语句中有多少实际匹配并影响了文档。在这种情况下，它是 2 匹配和修改的，因为“第一次”更新操作匹配现有数组条目，而“最后一次”更改更新匹配文档不包含数组条目并执行 $push 进行修改。

结论

所以你有组合方法，其中：

您问题中“更新”的第一部分非常简单，可以在单个语句中完成，如第一部分所示。
第二部分在当前文档数组中有一个“目前不存在”的数组元素，这实际上需要你使用bulkWrite()来发出“多个”单个请求中的操作。

因此update，对单个操作来说是“YES”。但是加差意味着多次操作。但是您可以将这两种方法结合起来，就像这里演示的那样。

有许多“奇特”的方式可以根据“更改集”数组内容和代码来构造这些语句，因此您不需要“硬编码”每个成员。

作为 JavaScript 的基本案例并与当前版本的 mongo shell 兼容（有点烦人的是它不支持对象扩展运算符）：

db.getCollection('avail_rates_copy').drop();
db.getCollection('avail_rates_copy').insert(
  {
    "_id" : 12345,
    "_class" : "com.example.ProductRates",
    "rates" : [
      {
        "productId" : 1234,
        "rate" : 100,
        "rateCardId" : 1,
        "month" : 201801
      },
      {
        "productId" : 1234,
        "rate" : 200,
        "rateCardId" : 1,
        "month" : 201802
      },
      {
        "productId" : 1234,
        "rate" : 400,
        "rateCardId" : 2,
        "month" : 201803
      },
      {
        "productId" : 1235,
        "rate" : 500,
        "rateCardId" : 1,
        "month" : 201801
      },
      {
        "productId" : 1235,
        "rate" : 234,
        "rateCardId" : 2,
        "month" : 201803
      }
    ]
  }
);

var changeSet = [
  {
      "productId" : 1234, 
      "rate" : 400.0, 
      "rateCardId": 1,
      "month" : 201801
  }, 
  {
      "productId" : 1234, 
      "rate" : 500.0, 
      "rateCardId": 1,
      "month" : 201802
  }, 
  {

      "productId" : 1235, 
      "rate" : 700.0, 
      "rateCardId": 1,
      "month" : 201802
  }
];

var arrayFilters = changeSet.map((obj,i) => 
  Object.keys(obj).filter(k => k != 'rate' )
    .reduce((o,k) => Object.assign(o, { [`u${i}.${k}`]: obj[k] }) ,{})
);

var $set = changeSet.reduce((o,r,i) =>
  Object.assign(o, { [`rates.$[u${i}].rate`]: r.rate }), {});

var updates = [
  { "updateOne": {
    "filter": { "_id": 12345 },
    "update": { $set },
    arrayFilters
  }},
  ...changeSet.map(obj => (
    { "updateOne": {
      "filter": {
        "_id": 12345,
        "rates": {
          "$not": {
            "$elemMatch": Object.keys(obj).filter(k => k != 'rate')
              .reduce((o,k) => Object.assign(o, { [k]: obj[k] }),{})
          }
        }
      },
      "update": {
        "$push": {
          "rates": obj
        }
      }
    }}
  ))
];

db.getCollection('avail_rates_copy').bulkWrite(updates,{ ordered: true });

这将动态构建一个“批量”更新操作列表，如下所示：

[
  {
    "updateOne": {
      "filter": {
        "_id": 12345
      },
      "update": {
        "$set": {
          "rates.$[u0].rate": 400,
          "rates.$[u1].rate": 500,
          "rates.$[u2].rate": 700
        }
      },
      "arrayFilters": [
        {
          "u0.productId": 1234,
          "u0.rateCardId": 1,
          "u0.month": 201801
        },
        {
          "u1.productId": 1234,
          "u1.rateCardId": 1,
          "u1.month": 201802
        },
        {
          "u2.productId": 1235,
          "u2.rateCardId": 1,
          "u2.month": 201802
        }
      ]
    }
  },
  {
    "updateOne": {
      "filter": {
        "_id": 12345,
        "rates": {
          "$not": {
            "$elemMatch": {
              "productId": 1234,
              "rateCardId": 1,
              "month": 201801
            }
          }
        }
      },
      "update": {
        "$push": {
          "rates": {
            "productId": 1234,
            "rate": 400,
            "rateCardId": 1,
            "month": 201801
          }
        }
      }
    }
  },
  {
    "updateOne": {
      "filter": {
        "_id": 12345,
        "rates": {
          "$not": {
            "$elemMatch": {
              "productId": 1234,
              "rateCardId": 1,
              "month": 201802
            }
          }
        }
      },
      "update": {
        "$push": {
          "rates": {
            "productId": 1234,
            "rate": 500,
            "rateCardId": 1,
            "month": 201802
          }
        }
      }
    }
  },
  {
    "updateOne": {
      "filter": {
        "_id": 12345,
        "rates": {
          "$not": {
            "$elemMatch": {
              "productId": 1235,
              "rateCardId": 1,
              "month": 201802
            }
          }
        }
      },
      "update": {
        "$push": {
          "rates": {
            "productId": 1235,
            "rate": 700,
            "rateCardId": 1,
            "month": 201802
          }
        }
      }
    }
  }
]

就像在一般答案的“长形式”中描述的那样，但当然只是使用输入的“数组”内容来构造所有这些语句。

您可以使用任何语言进行此类动态对象构造，并且所有 MongoDB 驱动程序都接受您可以“操作”的某种结构类型的输入，然后在实际发送到服务器执行之前将其转换为 BSON。

注意：arrayFilters 的<identifier>必须由字母数字字符组成，并且必须以字母字符开头。因此，在构造动态语句时，我们以 "a" 为前缀，然后是正在处理的每个项目的当前数组索引。

【讨论】：

谢谢，这是非常详细的解释，解决了我的用例。