【问题标题】：Matching ObjectId to String for $graphLookup将 ObjectId 与 $graphLookup 的字符串匹配
【发布时间】：2020-02-09 13:53:10
【问题描述】：

我正在尝试运行$graphLookup，如下面的打印所示：

目标是，给定特定记录（在此处注释 $match），通过 immediateAncestors 属性检索它的完整“路径”。如您所见，它没有发生。

我在这里介绍了$convert，将集合中的_id 处理为string，相信可以与immediateAncestors 记录列表（即string）中的_id“匹配”。

所以，我确实用不同的数据进行了另一次测试（不涉及ObjectIds）：

db.nodos.insert({"id":5,"name":"cinco","children":[{"id":4}]})
db.nodos.insert({"id":4,"name":"quatro","ancestors":[{"id":5}],"children":[{"id":3}]})
db.nodos.insert({"id":6,"name":"seis","children":[{"id":3}]})
db.nodos.insert({"id":1,"name":"um","children":[{"id":2}]})
db.nodos.insert({"id":2,"name":"dois","ancestors":[{"id":1}],"children":[{"id":3}]})
db.nodos.insert({"id":3,"name":"três","ancestors":[{"id":2},{"id":4},{"id":6}]})
db.nodos.insert({"id":7,"name":"sete","children":[{"id":5}]})

还有查询：

db.nodos.aggregate( [
  { $match: { "id": 3 } },
  { $graphLookup: {
      from: "nodos",
      startWith: "$ancestors.id",
      connectFromField: "ancestors.id",
      connectToField: "id",
      as: "ANCESTORS_FROM_BEGINNING"
    }
  },
  { $project: {
      "name": 1,
      "id": 1,
      "ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING.id"
    }
  }
] )

...输出我所期望的（这五个记录直接和间接连接到 id 3 的记录）：

{
    "_id" : ObjectId("5afe270fb4719112b613f1b4"),
    "id" : 3.0,
    "name" : "três",
    "ANCESTORS_FROM_BEGINNING" : [ 
        1.0, 
        4.0, 
        6.0, 
        5.0, 
        2.0
    ]
}

问题是：有没有办法实现我开头提到的目标？

我正在运行 Mongo 3.7.9（来自官方 Docker）

提前致谢！

【问题讨论】：

不像你想象的那样。这里的问题似乎是$graphLookup 中的"from" 是从“集合”中获取数据，而不是连续递归中的$project。您可以尝试进行投影并将其用作“来自”源的“视图”。此外，您可能只需要在此上下文中使用别名 $toString 或 $toObjectId，因为在您的上下文中 onError 没有实际用途。
请记住，尽管使用 $convert 或其别名用于此特定目的实际上是一种“创可贴”，而不是实际的解决方案。 “真正的解决方案”是确保“看起来像 ObjectId”的数据在您记录它的所有地方实际上都是 ObjectId 类型。这种“蛮力”绝不是引入铸造转换的意图。
感谢尼尔的考虑。你认为存储一个真正的ObjectId 会让事情变得更好吗？如果碰巧我可以做到这一点，那么在没有$project 和$convert 的情况下，查询是否有效？
对不起，离开了。您的“屏幕截图”在这里并没有真正的帮助。最好发现数据的 JSON 视图并将其中的一些内容粘贴到问题中。我的一般意见是，您尝试做的“蛮力”不是正确的方法，而是应该在数据中转换类型，以便在没有强制的情况下实际匹配。但是在帖子中添加一些清晰的数据，我们可以将其与复制/粘贴一起使用，这样更容易解释。

标签： mongodb mongodb-query aggregation-framework

【解决方案1】：

您目前正在使用 MongoDB 的开发版本，该版本启用了一些功能，预计将与 MongoDB 4.0 作为正式版本一起发布。请注意，在最终发布之前，某些功能可能会发生变化，因此生产代码在您提交之前应该意识到这一点。

为什么 $convert 在这里失败

可能解释这一点的最佳方法是查看更改后的样本，但将 _id 的值替换为 ObjectId 值，将数组下的值替换为“字符串”：

{
  "_id" : ObjectId("5afe5763419503c46544e272"),
   "name" : "cinco",
   "children" : [ { "_id" : "5afe5763419503c46544e273" } ]
},
{
  "_id" : ObjectId("5afe5763419503c46544e273"),
  "name" : "quatro",
  "ancestors" : [ { "_id" : "5afe5763419503c46544e272" } ],
  "children" : [ { "_id" : "5afe5763419503c46544e277" } ]
},
{ 
  "_id" : ObjectId("5afe5763419503c46544e274"),
  "name" : "seis",
  "children" : [ { "_id" : "5afe5763419503c46544e277" } ]
},
{ 
  "_id" : ObjectId("5afe5763419503c46544e275"),
  "name" : "um",
  "children" : [ { "_id" : "5afe5763419503c46544e276" } ]
}
{
  "_id" : ObjectId("5afe5763419503c46544e276"),
  "name" : "dois",
  "ancestors" : [ { "_id" : "5afe5763419503c46544e275" } ],
  "children" : [ { "_id" : "5afe5763419503c46544e277" } ]
},
{ 
  "_id" : ObjectId("5afe5763419503c46544e277"),
  "name" : "três",
  "ancestors" : [
    { "_id" : "5afe5763419503c46544e273" },
    { "_id" : "5afe5763419503c46544e274" },
    { "_id" : "5afe5763419503c46544e276" }
  ]
},
{ 
  "_id" : ObjectId("5afe5764419503c46544e278"),
  "name" : "sete",
  "children" : [ { "_id" : "5afe5763419503c46544e272" } ]
}

这应该可以大致模拟您尝试使用的内容。

您尝试的是在进入$graphLookup 阶段之前通过$project 将_id 值转换为“字符串”。失败的原因是当您在此管道中执行初始$project“内”时，问题是"from" 选项中$graphLookup 的源仍然是未更改的集合，因此您没有得到正确的详细信息在随后的“查找”迭代中。

db.strcoll.aggregate([
  { "$match": { "name": "três" } },
  { "$addFields": {
    "_id": { "$toString": "$_id" }
  }},
  { "$graphLookup": {
    "from": "strcoll",
    "startWith": "$ancestors._id",
    "connectFromField": "ancestors._id",
    "connectToField": "_id",
    "as": "ANCESTORS_FROM_BEGINNING"
  }},
  { "$project": {
    "name": 1,
    "ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING._id"
  }}
])

因此在“查找”中不匹配：

{
        "_id" : "5afe5763419503c46544e277",
        "name" : "três",
        "ANCESTORS_FROM_BEGINNING" : [ ]
}

“修补”问题

但是，这是核心问题，而不是 $convert 的失败或其别名本身。为了使它真正起作用，我们可以改为创建一个"view"，它将自己呈现为一个集合以供输入。

我会反过来做，通过$toObjectId将“字符串”转换为ObjectId：

db.createView("idview","strcoll",[
  { "$addFields": {
    "ancestors": {
      "$ifNull": [ 
        { "$map": {
          "input": "$ancestors",
          "in": { "_id": { "$toObjectId": "$$this._id" } }
        }},
        "$$REMOVE"
      ]
    },
    "children": {
      "$ifNull": [
        { "$map": {
          "input": "$children",
          "in": { "_id": { "$toObjectId": "$$this._id" } }
        }},
        "$$REMOVE"
      ]
    }
  }}
])

然而，使用"view" 意味着数据与转换后的值一致。因此使用视图进行以下聚合：

db.idview.aggregate([
  { "$match": { "name": "três" } },
  { "$graphLookup": {
    "from": "idview",
    "startWith": "$ancestors._id",
    "connectFromField": "ancestors._id",
    "connectToField": "_id",
    "as": "ANCESTORS_FROM_BEGINNING"
  }},
  { "$project": {
    "name": 1,
    "ANCESTORS_FROM_BEGINNING": "$ANCESTORS_FROM_BEGINNING._id"
  }}
])

返回预期的输出：

{
    "_id" : ObjectId("5afe5763419503c46544e277"),
    "name" : "três",
    "ANCESTORS_FROM_BEGINNING" : [
        ObjectId("5afe5763419503c46544e275"),
        ObjectId("5afe5763419503c46544e273"),
        ObjectId("5afe5763419503c46544e274"),
        ObjectId("5afe5763419503c46544e276"),
        ObjectId("5afe5763419503c46544e272")
    ]
}

解决问题

说了这么多，这里真正的问题是你有一些数据“看起来像”ObjectId 值，实际上作为ObjectId 是有效的，但是它已被记录为“字符串” .一切正常工作的基本问题是两种“类型”不相同，这会导致尝试“连接”时出现相等不匹配。

所以真正的修复还是和以前一样，而是遍历数据并修复它，以便“字符串”实际上也是 ObjectId 值。然后，它们将匹配它们要引用的 _id 键，并且您节省了大量的存储空间，因为 ObjectId 占用的存储空间比它以十六进制字符表示的字符串表示形式要少得多。

使用 MongoDB 4.0 方法，您“可以”实际上使用"$toObjectId" 来编写新集合，这与我们之前创建“视图”的情况大致相同： p>

db.strcoll.aggregate([
  { "$addFields": {
    "ancestors": {
      "$ifNull": [ 
        { "$map": {
          "input": "$ancestors",
          "in": { "_id": { "$toObjectId": "$$this._id" } }
        }},
        "$$REMOVE"
      ]
    },
    "children": {
      "$ifNull": [
        { "$map": {
          "input": "$children",
          "in": { "_id": { "$toObjectId": "$$this._id" } }
        }},
        "$$REMOVE"
      ]
    }
  }}
  { "$out": "fixedcol" }
])

当然，如果您“需要”保持相同的集合，那么传统的“循环和更新”仍然与一直需要的一样：

var updates = [];

db.strcoll.find().forEach(doc => {
  var update = { '$set': {} };

  if ( doc.hasOwnProperty('children') )
    update.$set.children = doc.children.map(e => ({ _id: new ObjectId(e._id) }));
  if ( doc.hasOwnProperty('ancestors') )
    update.$set.ancestors = doc.ancestors.map(e => ({ _id: new ObjectId(e._id) }));

  updates.push({
    "updateOne": {
      "filter": { "_id": doc._id },
      update
    }
  });

  if ( updates.length > 1000 ) {
    db.strcoll.bulkWrite(updates);
    updates = [];
  }

})

if ( updates.length > 0 ) {
  db.strcoll.bulkWrite(updates);
  updates = [];
}

这实际上有点像“大锤”，因为实际上一次就覆盖了整个数组。对于生产环境来说不是一个好主意，但足以作为本练习的演示。

结论

因此，虽然 MongoDB 4.0 将添加这些确实非常有用的“转换”功能，但它们的实际意图并不是针对此类情况。实际上，它们比大多数其他可能的用途更有用，正如在使用聚合管道“转换”到新集合中所证明的那样。

虽然我们“可以”创建一个“视图”，它可以转换数据类型以使 $lookup 和 $graphLookup 之类的东西能够在实际收集数据不同的情况下工作，但这实际上只是一个“创可贴”关于真正的问题，因为数据类型真的不应该不同，实际上应该永久转换。

使用“视图”实际上意味着构建的聚合管道需要每次在“集合”（实际上是“视图”）被访问时有效地运行，这会产生真正的开销。

避免开销通常是设计目标，因此纠正此类数据存储错误对于从应用程序中获得真正的性能至关重要，而不是仅仅使用只会减慢速度的“蛮力”。

一个更安全的“转换”脚本，它对每个数组元素应用“匹配”更新。此处的代码需要 NodeJS v10.x 和最新版本的 MongoDB 节点驱动程序 3.1.x：

const { MongoClient, ObjectID: ObjectId } = require('mongodb');
const EJSON = require('mongodb-extended-json');

const uri = 'mongodb://localhost/';

const log = data => console.log(EJSON.stringify(data, undefined, 2));

(async function() {

  try {

    const client = await MongoClient.connect(uri);
    let db = client.db('test');
    let coll = db.collection('strcoll');

    let fields = ["ancestors", "children"];

    let cursor = coll.find({
      $or: fields.map(f => ({ [`${f}._id`]: { "$type": "string" } }))
    }).project(fields.reduce((o,f) => ({ ...o, [f]: 1 }),{}));

    let batch = [];

    for await ( let { _id, ...doc } of cursor ) {

      let $set = {};
      let arrayFilters = [];

      for ( const f of fields ) {
        if ( doc.hasOwnProperty(f) ) {
          $set = { ...$set,
            ...doc[f].reduce((o,{ _id },i) =>
              ({ ...o, [`${f}.$[${f.substr(0,1)}${i}]._id`]: ObjectId(_id) }),
              {})
          };

          arrayFilters = [ ...arrayFilters,
            ...doc[f].map(({ _id },i) =>
              ({ [`${f.substr(0,1)}${i}._id`]: _id }))
          ];
        }
      }

      if (arrayFilters.length > 0)
        batch = [ ...batch,
          { updateOne: { filter: { _id }, update: { $set }, arrayFilters } }
        ];

      if ( batch.length > 1000 ) {
        let result = await coll.bulkWrite(batch);
        batch = [];
      }

    }

    if ( batch.length > 0 ) {
      log({ batch });
      let result = await coll.bulkWrite(batch);
      log({ result });
    }

    await client.close();

  } catch(e) {
    console.error(e)
  } finally {
    process.exit()
  }

})()

为七个文档生成并执行类似这样的批量操作：

{
  "updateOne": {
    "filter": {
      "_id": {
        "$oid": "5afe5763419503c46544e272"
      }
    },
    "update": {
      "$set": {
        "children.$[c0]._id": {
          "$oid": "5afe5763419503c46544e273"
        }
      }
    },
    "arrayFilters": [
      {
        "c0._id": "5afe5763419503c46544e273"
      }
    ]
  }
},
{
  "updateOne": {
    "filter": {
      "_id": {
        "$oid": "5afe5763419503c46544e273"
      }
    },
    "update": {
      "$set": {
        "ancestors.$[a0]._id": {
          "$oid": "5afe5763419503c46544e272"
        },
        "children.$[c0]._id": {
          "$oid": "5afe5763419503c46544e277"
        }
      }
    },
    "arrayFilters": [
      {
        "a0._id": "5afe5763419503c46544e272"
      },
      {
        "c0._id": "5afe5763419503c46544e277"
      }
    ]
  }
},
{
  "updateOne": {
    "filter": {
      "_id": {
        "$oid": "5afe5763419503c46544e274"
      }
    },
    "update": {
      "$set": {
        "children.$[c0]._id": {
          "$oid": "5afe5763419503c46544e277"
        }
      }
    },
    "arrayFilters": [
      {
        "c0._id": "5afe5763419503c46544e277"
      }
    ]
  }
},
{
  "updateOne": {
    "filter": {
      "_id": {
        "$oid": "5afe5763419503c46544e275"
      }
    },
    "update": {
      "$set": {
        "children.$[c0]._id": {
          "$oid": "5afe5763419503c46544e276"
        }
      }
    },
    "arrayFilters": [
      {
        "c0._id": "5afe5763419503c46544e276"
      }
    ]
  }
},
{
  "updateOne": {
    "filter": {
      "_id": {
        "$oid": "5afe5763419503c46544e276"
      }
    },
    "update": {
      "$set": {
        "ancestors.$[a0]._id": {
          "$oid": "5afe5763419503c46544e275"
        },
        "children.$[c0]._id": {
          "$oid": "5afe5763419503c46544e277"
        }
      }
    },
    "arrayFilters": [
      {
        "a0._id": "5afe5763419503c46544e275"
      },
      {
        "c0._id": "5afe5763419503c46544e277"
      }
    ]
  }
},
{
  "updateOne": {
    "filter": {
      "_id": {
        "$oid": "5afe5763419503c46544e277"
      }
    },
    "update": {
      "$set": {
        "ancestors.$[a0]._id": {
          "$oid": "5afe5763419503c46544e273"
        },
        "ancestors.$[a1]._id": {
          "$oid": "5afe5763419503c46544e274"
        },
        "ancestors.$[a2]._id": {
          "$oid": "5afe5763419503c46544e276"
        }
      }
    },
    "arrayFilters": [
      {
        "a0._id": "5afe5763419503c46544e273"
      },
      {
        "a1._id": "5afe5763419503c46544e274"
      },
      {
        "a2._id": "5afe5763419503c46544e276"
      }
    ]
  }
},
{
  "updateOne": {
    "filter": {
      "_id": {
        "$oid": "5afe5764419503c46544e278"
      }
    },
    "update": {
      "$set": {
        "children.$[c0]._id": {
          "$oid": "5afe5763419503c46544e272"
        }
      }
    },
    "arrayFilters": [
      {
        "c0._id": "5afe5763419503c46544e272"
      }
    ]
  }
}

【讨论】：

很棒的解释！我将保持性能并从一开始就使用“干净”的字段?
@Cesar 值得一写，因为你不会是最后一个问它的人。人们现在已经问过，但还不知道这些“转换”功能即将到来。一旦它投入生产，我预计尝试它的人数只会增加。所以“谢谢你的提问”我猜。