【发布时间】:2022-01-23 15:10:51
【问题描述】:
我需要快速统计相关文档。
所以,我有四个系列
组
{ "_id" : "g1", "name" : "group1" }
{ "_id" : "g2", "name" : "group2" }
课程
{ "_id" : "c1", "name" : "course1", "group_id" : "g1" }
{ "_id" : "c2", "name" : "course2", "group_id" : "g2" }
主题
{ "_id" : "t1", "name" : "top1c11", "course_id" : "c1" }
{ "_id" : "t2", "name" : "top1c12", "course_id" : "c1" }
{ "_id" : "t3", "name" : "top1c21", "course_id" : "c2" }
课程
{ "_id" : "l1", "name" : "lesson111", "topic_id" : "t1" }
{ "_id" : "l2", "name" : "lesson112", "topic_id" : "t1" }
{ "_id" : "l3", "name" : "lesson121", "topic_id" : "t2" }
{ "_id" : "l4", "name" : "lesson211", "topic_id" : "t3" }
我需要计算特定组的所有课程。
我尝试运行以下聚合,但没有等待响应。 (但它适用于少量数据)
db.getCollection('lessons').aggregate([
{
"$lookup": {
"from": "topics",
"let": { "topicId": "$topic_id" },
"pipeline": [
{
"$match": { "$expr": { "$eq": [ "$_id", "$$topicId" ] } }
},
{
"$lookup": {
"from": "courses",
"let": { "courseId": "$topic_id" },
"pipeline": [
{ "$match": { "$expr": { "$eq": [ "$course_id", "$$courseId" ] } } },
],
"as": "course"
},
},
{
"$unwind": "$course"
}
],
"as": "topic"
},
},
{
"$unwind" : "$topic"
},
{
"$match": {
"topic.course.group_id" : "g1"
}
},
{
$group: {
_id: "$course",
"amount": {$sum:1},
}
}
])
我相信这种聚合是可以优化的。但我不确定这是否是为此目的使用聚合框架的好方法。如果是这样,我该如何优化聚合。
集合大小(测试数据):
- 课程:30000
- 主题:200000
- 课程:30000000
现在我在代码中使用简单的嵌套循环来计算课程。这需要 10 秒(对于某个组的 3000 个主题)。
【问题讨论】:
标签: mongodb aggregation-framework