【问题标题】:MongoDB统计相关文档(3级)
【发布时间】:2022-01-23 15:10:51
【问题描述】:

我需要快速统计相关文档。

所以,我有四个系列

{ "_id" : "g1", "name" : "group1" }
{ "_id" : "g2", "name" : "group2" }

课程

{ "_id" : "c1", "name" : "course1", "group_id" : "g1" }
{ "_id" : "c2", "name" : "course2", "group_id" : "g2" }

主题

{ "_id" : "t1", "name" : "top1c11", "course_id" : "c1" }
{ "_id" : "t2", "name" : "top1c12", "course_id" : "c1" }
{ "_id" : "t3", "name" : "top1c21", "course_id" : "c2" }

课程

{ "_id" : "l1", "name" : "lesson111", "topic_id" : "t1" }
{ "_id" : "l2", "name" : "lesson112", "topic_id" : "t1" }
{ "_id" : "l3", "name" : "lesson121", "topic_id" : "t2" }
{ "_id" : "l4", "name" : "lesson211", "topic_id" : "t3" }

我需要计算特定组的所有课程。

我尝试运行以下聚合,但没有等待响应。 (但它适用于少量数据)

db.getCollection('lessons').aggregate([
{
    "$lookup": {
        "from": "topics",
        "let": { "topicId": "$topic_id" },
        "pipeline": [
            { 
                "$match": { "$expr": { "$eq": [ "$_id", "$$topicId" ] } } 
            },
            {
                "$lookup": {
                    "from": "courses",
                    "let": { "courseId": "$topic_id" },
                    "pipeline": [
                        { "$match": { "$expr": { "$eq": [ "$course_id", "$$courseId" ] } } },
                    ],
                    "as": "course"
                },
            },
            {
                "$unwind": "$course"
            }

        ],
        "as": "topic"
    },
},
{
    "$unwind" : "$topic"
},
{
    "$match": {
        "topic.course.group_id" : "g1"
    }
},
{
    $group: {
        _id: "$course",
        "amount": {$sum:1},
    }
}
])

我相信这种聚合是可以优化的。但我不确定这是否是为此目的使用聚合框架的好方法。如果是这样,我该如何优化聚合。

集合大小(测试数据):

  • 课程:30000
  • 主题:200000
  • 课程:30000000

现在我在代码中使用简单的嵌套循环来计算课程。这需要 10 秒(对于某个组的 3000 个主题)。

【问题讨论】:

    标签: mongodb aggregation-framework


    【解决方案1】:

    来自comment of Takis 的解决方案。 Query1,4.2采用

    groups.aggregate(
    [{"$match":{"_id":"g1"}},
     {"$lookup":
      {"from":"courses",
       "localField":"_id",
       "foreignField":"group_id",
       "as":"courses"}},
     {"$unwind":"$courses"},
     {"$lookup":
      {"from":"topics",
       "localField":"courses._id",
       "foreignField":"course_id",
       "as":"topics"}},
     {"$unwind":"$topics"},
     {"$lookup":
      {"from":"lessons",
       "pipeline":
       [{"$match":{"$expr":{"$eq":["$$ptopic", "$topic_id"]}}},
        {"$group":{"_id":null, "lessons":{"$sum":1}}},
        {"$set":{"id":"$_id", "_id":"$$REMOVE"}}],
       "as":"lessons",
       "let":{"ptopic":"$topics._id"}}},
     {"$set":
      {"lessons":
       {"$cond":
        [{"$eq":["$lessons", []]}, 0,
         {"$arrayElemAt":["$lessons.lessons", 0]}]}}},
     {"$group":{"_id":"$_id", "totalLessons":{"$sum":"$lessons"}}}])
    

    【讨论】:

      【解决方案2】:

      查询1

      • 非嵌套查找(查找和展开)
      • 匹配组
      • 查找和展开 3 倍,最后一次查找仅计算课程,并使用管道查找
      • 分组_id,获取总课数

      你需要的索引(所有的foreignField)

      • courses.group_id
      • topics.course_id
      • lessons.topic_id

      Test code here

      groups.aggregate(
      [{"$match":{"_id":"g1"}},
       {"$lookup":
        {"from":"courses",
         "localField":"_id",
         "foreignField":"group_id",
         "as":"courses"}},
       {"$unwind":"$courses"},
       {"$lookup":
        {"from":"topics",
         "localField":"courses._id",
         "foreignField":"course_id",
         "as":"topics"}},
       {"$unwind":"$topics"},
       {"$lookup":
        {"from":"lessons",
         "localField":"topics._id",
         "foreignField":"topic_id",
         "pipeline":
         [{"$group":{"_id":null, "lessons":{"$sum":1}}},
          {"$set":{"id":"$_id", "_id":"$$REMOVE"}}],
         "as":"lessons"}},
       {"$set":
        {"lessons":
         {"$cond":
          [{"$eq":["$lessons", []]}, 0,
           {"$arrayElemAt":["$lessons.lessons", 0]}]}}},
       {"$group":{"_id":"$_id", "totalLessons":{"$sum":"$lessons"}}}])
      

      查询2

      • 嵌套查找(不展开)
      • 代码相同,只是嵌套

      Test code here

      groups.aggregate(
      [{"$match":{"_id":"g1"}},
       {"$lookup":
        {"from":"courses",
         "localField":"_id",
         "foreignField":"group_id",
         "pipeline":
         [{"$lookup": 
           {"from":"topics",
            "localField":"_id",
            "foreignField":"course_id",
            "pipeline":
            [{"$lookup":
              {"from":"lessons",
               "localField":"_id",
               "foreignField":"topic_id",
               "pipeline":
               [{"$group":{"_id":null, "lessons":{"$sum":1}}},
                {"$set":{"id":"$_id", "_id":"$$REMOVE"}}],
               "as":"lessons"}},
             {"$set":
              {"lessons":
               {"$cond":
                [{"$eq":["$lessons", []]}, 0,
                 {"$arrayElemAt":["$lessons.lessons", 0]}]}}}],
            "as":"topics"}},
          {"$project":
           {"_id":0, "totalLessons":{"$sum":"$topics.lessons"}}}],
         "as":"courses"}},
       {"$set":
        {"courses":"$$REMOVE",
         "totalLessons":
         {"$cond":
          [{"$eq":["$courses", []]}, 0,
           {"$arrayElemAt":["$courses.totalLessons", 0]}]}}}])
      

      如果您可以就哪个速度更快发送一些反馈。
      如果 1 组非常快,可能会删除匹配项,对所有组执行此操作,或者允许从匹配项传递更多组。

      【讨论】:

      • 我收到一个错误$lookup with 'pipeline' may not specify 'localField' or 'foreignField'"
      • 以上需要MongoDB 5,你有什么版本?
      • 哦,我用的是 MongoDB 4.2。
      • 感谢您的回答。我明白了,我尝试采用我的 MongoDB 版本
      • this 将适用于 mongodb 4.2 ,它的第一个查询,只是在管道的最后一次查找中使用 let。第二个需要更改所有查找才能工作,索引是最重要的,否则会很慢。
      猜你喜欢
      • 2018-07-01
      • 2019-05-23
      • 2011-03-16
      • 2019-03-20
      • 2012-05-05
      • 1970-01-01
      • 1970-01-01
      • 2021-05-14
      • 1970-01-01
      相关资源
      最近更新 更多