MongoDB 搜索和分页聚合性能问题答案

【问题标题】：MongoDB search and pagination Aggregation Performance issueMongoDB 搜索和分页聚合性能问题
【发布时间】：2019-11-25 21:23:52
【问题描述】：

我是 node js 和 MongoDB 的新手。我正在研究运行良好的 MongoDB 搜索和分页，但我遇到了性能问题。计数和搜索记录花费了太多时间。

如果我使用小词搜索，那么它会更快，如果我使用“长字符串”或“数据库中没有记录”，那么它会花费太多时间，即 50 到 186.30 秒。（时间太长了，我预计是 1 到 2 秒）。

我的记录中有超过 15,00,000 条数据。

如果我不包括搜索词的计数。这需要 0.20 到 1.5 秒，但是当我在搜索单词时计算记录时需要 25.0 到 35.0 秒。

我不知道如何减少用搜索词计算记录的时间（查询优化）。

我尝试了最大级别的查询优化。

我也试过

{
  $count: "passing_scores"
}

但没有按时更改。我坚持下去。我必须减少搜索词的计数时间。

例如 SQL 查询

  SELECT * FROM `post`
    Left JOIN catagory ON post.catid=catagory.id
    WHERE post_name LIKE '%a%' OR post_data LIKE '%a%' OR tags LIKE '%a%' OR post_url LIKE '%a%'

NODE 和 MongoDB

PostObj.count({},function(err,totalCount) {
        if(err) {
            response = {"error" : true,"message" : "Error fetching data"}
        }
        PostObj.aggregate([
        { $lookup:
                {
                    from: 'catagories',
                    localField: 'catagory.catagory_id',
                    foreignField: '_id',
                    as: 'catagories_data'
                }
        },
        {

            $match:
                {
                    $or: [
                        {"catagories_data.catagory_name": { $regex: new RegExp(search_data)}},
                        {"postname": { $regex: new RegExp(search_data) }},
                        {"posturl": { $regex: new RegExp(search_data) }},
                        {"postdata": { $regex: new RegExp(search_data) }},
                        {"tags": { $regex: new RegExp(search_data) }}
                    ]
                }
        },            
        { $limit : search_limit },
        { $skip : search_skip },
        { $group : { _id : "$_id", postname: { $push: "$postname" } , posturl: { $push: "$posturl" }  } } 
    ]).exec(function (err, data){  

        //end insert log data        
        if(err) {
            response = {"error" : true,"message" :err};
        } 

        if(search_data != "")
        {
            // count record using search word

            PostObj.aggregate([
                    { $lookup:
                        {
                            from: 'catagories',
                            localField: 'catagory.catagory_id',
                            foreignField: '_id',
                            as: 'catagories_data'
                        }
                },
                {

                    $match:
                        {
                            $or: [
                                {"catagories_data.catagory_name": { $regex: new RegExp(search_data)}},
                                {"postname": { $regex: new RegExp(search_data) }},
                                {"posturl": { $regex: new RegExp(search_data) }},
                                {"postdata": { $regex: new RegExp(search_data) }},
                                {"tags": { $regex: new RegExp(search_data) }}
                            ]
                        }
                },    
                { $group: { _id: null, myCount: { $sum: 1 } } },
                { $project: { _id: 0 } }   
            ]).exec(function (err, Countdata){  
                res.json({
                sEcho : req.body.draw,
                iTotalRecords: Countdata.myCount,
                iTotalDispla,yRecords: Countdata.myCount,
                aaData: data
            });
        }

        res.json({
            sEcho : req.body.draw,
            iTotalRecords: totalPages,
            iTotalDisplayRecords: totalPages,
            aaData: data
        });
    });
});

另外，我必须尝试这种方式，但它比第一个代码多 35.0 到 49.0 秒。

PostObj.aggregate([
    { $lookup:
               {
                            from: 'catagories',
                            localField: 'catagory.catagory_id',
                            foreignField: '_id',
                            as: 'catagories_data'
                        }
                },
                {

                    $match:
                        {
                            $or: [
                                {"catagories_data.catagory_name": { $regex: new RegExp(search_data)}},
                                {"postname": { $regex: new RegExp(search_data) }},
                                {"posturl": { $regex: new RegExp(search_data) }},
                                {"postdata": { $regex: new RegExp(search_data) }},
                                {"tags": { $regex: new RegExp(search_data) }}
                            ]
                        }
                }, 
    { '$facet'    : {
        metadata: [ { $count: "total" }, { $addFields: { page: NumberInt(3) } } ],
        data: [ { $skip: 20 }, { $limit: 10 } ] // add projection here wish you re-shape the docs
    } }
] )

如果我不使用搜索词，它会很好用。我在搜索任何单词时遇到问题（该作品的记录数没有跳过和限制）

收集数据

发帖

 {
   "_id": ObjectId("5d29bd7609f28633f38ccc13"),
   "postname": "this is some data ",
   "tags " : "
   Damita,
   Caro,
   Leontyne,
   Theodosia,
   Vyky ",
   "postdata ": "Berry Samara Kellia Rebekah Linette Hyacinthie Joelly Micky Tomasina Christian Fae Doralynn Chelsea Aurie Gwendolyn Tate
   Cairistiona Ardys Aubrie Damita Olga Kelli Leone Marthena Kelcy
   Cherlyn Molli Pris Ginelle Sula Johannah Hedwig Adelle Editha Lindsey
   Loleta Lenette Ann Heidie Drona Charlena Emilia Manya Ketti Dorthea
   Jeni Lorene Eolanda Karoly Loretta Marylou Tommie Leontyne Winny Cyb
   Violet Pavia Karen Idelle Betty Doloritas Judye Aretha Quinta Billie
   Vallie Fiona Letty Gates Shandra Rosemary Dorice Doro Coral Tove Crin
   Bobbe Kristan Tierney Gianina Val Daniela Kellyann Marybeth Konstance
   Nixie Andeee Jolene Patrizia Carla Arabella Berna Roseline Lira Cristy
   Hedi Clem Nerissa ",
   "catagory " : [
     { "catagory_id " : [ ObjectId("5d29bd7509f28633f38ccbfd")]},
     { "catagory_id": [ ObjectId("5d29bd7509f28633f38ccbfd") ]}],
   "createby": "5d22f712fe481b2a9afda4aa"
 }

类别

{
  "_id": ObjectId("5d29bc271a68fb333531f6a1"),
  "catagory_name": "Katharine",
  "catagory_description": "Katharine"
}

有什么解决办法吗？

【问题讨论】：

如果你能分享你收藏的原型或声明会更好。
@Shivam Mishra：我已经更新了
您能分享一下您收藏的索引吗？另外，分享您正在使用的正则表达式（任何一个）？
问题出在您的数据模型中。基本上，你正在做 15M $lookups。
@MarkusWMahlberg 15M 查找（使用索引）+ (15 *4) 百万正则表达式匹配（不使用任何索引）

标签： node.js mongodb mongoose pagination

【解决方案1】：

我可以建议您尝试一些技巧。

1：POST 收藏

您似乎只将 category_id 存储在您的 category 对象属性数组中，您应该避免这种情况。相反，您应该做的如下。

在category 集合内创建新属性post_id，而不是[高性能方法] 中post collection 中的类别对象数组。

或

将对象数组的category属性转换为简单数组。 [平均表现]。 Ex: category: [ ObjectId("5d29bd7509f28633f38ccbfd", ObjectId("5d29bd7509f28633f38ccbfd", ObjectId("5d29bd7509f28633f38ccbfd"];

在这两种情况下，post_id 或 category 属性都必须被索引。

2：查找

使用简单的lookup 管道，您应该使用pipeline 方法

例如：

不好。

$lookup:{
    from: 'catagories',
    localField: 'catagory.catagory_id', // BAD IDEA //
    foreignField: '_id',
    as: 'catagories_data'
},

很好。

$lookup:{
    from: 'catagories',
    localField: '_id',
    foreignField: 'post_id',  // GOOD IDEA
    as: 'catagories_data'
},

更好


$lookup:{
    let : { post_id: "$_id" },
    from: 'catagories',
    pipeline:[
              {
                    $match: {
                        $expr: {
                            $and: [
                                { $eq: ["$post_id", "$$post_id"], },
                            ]
                        }
                    },
                },
                {
                    $match: {
                        $or: [

                            // AVOID `new` keyword if you can do such;
                            // and create indexes for the same;

                            { "catagory_name": { $regex: `^${search_data}` } },
                            { "postname": { $regex: `^${search_data}` } },
                            { "posturl": { $regex: `^${search_data}` } },
                            { "postdata": { $regex: `^${search_data}` } },
                            { "tags": { $regex: `^${search_data}` } }
                        ]
                    }

                }
    ],
    as: 'catagories_data'
},

毕竟facet pipeline seems fine to me.

'$facet' : {
    metadata: [ { $count: "total" }, { $addFields: { page: NumberInt(3) } } ],
    data: [ { $skip: 20 }, { $limit: 10 } ] // add projection here wish you re-shape the docs
}

减速查询的其他方面取决于

后端服务器和数据库服务器的配置。
前端 -> 后端 -> 数据库服务器之间的距离。
每秒传入和传出请求。
当然是互联网连接

完整的查询如下所示

PostObj.aggregate([
    {
        $lookup: {
            let: { post_id: "$_id" },
            from: 'categories',
            pipeline: [
                {
                    $match: {
                        $expr: {
                            $and: [
                                { $eq: ["$post_id", "$$post_id"], },
                            ]
                        }
                    },
                },
                {
                    $match: {
                        $or: [

                            // AVOID `new` keyword if you can do such;
                            // and create indexes for the same;

                            { "catagory_name": { $regex: `^${search_data}` } },
                            { "postname": { $regex: `^${search_data}` } },
                            { "posturl": { $regex: `^${search_data}` } },
                            { "postdata": { $regex: `^${search_data}` } },
                            { "tags": { $regex: `^${search_data}` } }
                        ]
                    }

                }
            ],
            as: "catagories_data"
        }
    },
    {
        '$facet': {
            metadata: [{ $count: "total" }, { $addFields: { page: NumberInt(3) } }],
            catagories_data: [{ $skip: 0 }, { $limit: 10 }]
        }
    }
])

【讨论】：

它给出错误来自：'categories'，SyntaxError: Unexpected identifier
quamma 在let : { post_id: "$_id" } 丢失。我刚刚修好了。它只是流量的参考。
错误：参数必须是聚合管道运算符
您似乎在复制查询。您应该专注于流程并基于此构建查询。但可以让我在本地环境中创建和执行场景。
我认为您忘记在查询中使用 $lookup。

【解决方案2】：

如果在您的情况下，您的正则表达式只是在寻找一个（或几个）单词，那么最好使用$text 而不是 $regex。 $text 可以使用文本索引，因此速度更快。在 MySQL 方面，$text 是 LIKE，$regex 是 REGEXP。由于在您的示例 mysql 查询中您使用的是 LIKE，我非常有信心您也可以在您的 mongo 查询中使用 $text 而不是 $regex。

您需要（如果还没有）在您的字段上拥有复合“text”索引 - （邮件名称、标签、postdata 和 posturl）。

db.POST.createIndex(
   {
     postname: "text",
     tags: "text",
     posturl: "text",
     postdata: "text"
   }
 )

【讨论】：