【问题标题】:MongoDb Full text search with partial searchMongoDb 全文搜索和部分搜索
【发布时间】:2018-11-23 06:02:52
【问题描述】:

我正在使用 mongodb 3.6,我的收藏中几乎有 5-6 lkhs 文档。我想像全文搜索和部分搜索一样搜索。 `

 db.temp.find( {$and : [{"status" : {"$in" : [ 1,  2]} },
                          {$or:[ { $text: { $search: "school" }}
                                 ,{ cname : /school/i}
                                 ,{ name : /school/i}
                                  ]}  ]},
         {cname:1,name:1,followers:1,status :1, score: { $meta: 
          "textScore" } } ).sort( { score: { $meta: "textScore" 
          },status :-1 ,followers :-1 } )

` 临时集合索引。

  db.temp.createIndex(
   {
     name: "text",
     cname: "text"
  },
  {
    weights: {
     name: 4,
   cname: 2
     }
   }
   )
     db.getCollection("temp").createIndex({ 
            "cname": 1
        }, {background: true})


     db.getCollection("temp").createIndex({ 
          "status" : -1.0, 
         "followers" : -1.0
          }, {background: true});


        db.getCollection("temp").createIndex({ 
           "name": 1
           }, {background: true})`

文档如下:

{ 
       "_id" : 5011.0, 
       "cname" : "samyselvik", 
       "name" : "Samy Sam", 
       "imgname" : "nrwi4769731443194380996.jpg", 
       "followers" : 1.0, 
       "status" : 1.0, 
        "createdat" : 1443194421532.0
    }

当我检查执行('executionStats')时,它显示

"executionStats" :{
                    "executionSuccess" : true, 
                    "nReturned" : 363.0, 
                    "executionTimeMillis" : 894.0, 
                    "totalKeysExamined" : 921424.0, 
                    "totalDocsExamined" : 372.0, 
                    "executionStages" : {
                        "stage" : "PROJECTION", 
                        "nReturned" : 363.0, 
                        "executionTimeMillisEstimate" : 808.0, 
                        "works" : 921803.0, 
                        "advanced" : 363.0, 
                        "needTime" : 921439.0, 
                        "needYield" : 0.0, 
                        "saveState" : 7234.0, 
                        "restoreState" : 7234.0, 
                        "isEOF" : 1.0, 
                        "invalidates" : 0.0, 
                        "transformBy" : {
                            "cname" : 1.0, 
                            "name" : 1.0, 
                            "followers" : 1.0, 
                            "score" : {
                                "$meta" : "textScore"
                            }
                        }, 
                        "inputStage" : {
                            "stage" : "SORT", 
                            "nReturned" : 363.0, 
                            "executionTimeMillisEstimate" : 774.0, 
                            "works" : 921803.0, 
                            "advanced" : 363.0, 
                            "needTime" : 921439.0, 
                            "needYield" : 0.0, 
                            "saveState" : 7234.0, 
                            "restoreState" : 7234.0, 
                            "isEOF" : 1.0, 
                            "invalidates" : 0.0, 
                            "sortPattern" : {
                                "score" : {
                                    "$meta" : "textScore"
                                }, 
                                "status" : -1.0, 
                                "followers" : -1.0
                            }, 
                            "memUsage" : 131542.0, 
                            "memLimit" : 33554432.0, 
                            "limitAmount" : 500.0, 
                            "inputStage" : {
                                "stage" : "SORT_KEY_GENERATOR", 
                                "nReturned" : 363.0, 
                                "executionTimeMillisEstimate" : 730.0, 
                                "works" : 921439.0, 
                                "advanced" : 363.0, 
                                "needTime" : 921075.0, 
                                "needYield" : 0.0, 
                                "saveState" : 7234.0, 
                                "restoreState" : 7234.0, 
                                "isEOF" : 1.0, 
                                "invalidates" : 0.0, 
                                "inputStage" : {
                                    "stage" : "FETCH", 
                                    "filter" : {
                                        "status" : {
                                            "$in" : [
                                                1.0, 
                                                2.0
                                            ]
                                        }
                                    }, 
                                    "nReturned" : 363.0, 
                                    "executionTimeMillisEstimate" : 719.0, 
                                    "works" : 921438.0, 
                                    "advanced" : 363.0, 
                                    "needTime" : 921074.0, 
                                    "needYield" : 0.0, 
                                    "saveState" : 7234.0, 
                                    "restoreState" : 7234.0, 
                                    "isEOF" : 1.0, 
                                    "invalidates" : 0.0, 
                                    "docsExamined" : 363.0, 
                                    "alreadyHasObj" : 9.0, 
                                    "inputStage" : {
                                        "stage" : "OR", 
                                        "nReturned" : 363.0, 
                                        "executionTimeMillisEstimate" : 697.0, 
                                        "works" : 921438.0, 
                                        "advanced" : 363.0, 
                                        "needTime" : 921074.0, 
                                        "needYield" : 0.0, 
                                        "saveState" : 7234.0, 
                                        "restoreState" : 7234.0, 
                                        "isEOF" : 1.0, 
                                        "invalidates" : 0.0, 
                                        "dupsTested" : 399.0, 
                                        "dupsDropped" : 36.0, 
                                        "recordIdsForgotten" : 0.0, 
                                        "inputStages" : [
                                            {
                                                "stage" : "TEXT", 
                                                "nReturned" : 9.0, 
                                                "executionTimeMillisEstimate" : 0.0, 
                                                "works" : 21.0, 
                                                "advanced" : 9.0, 
                                                "needTime" : 11.0, 
                                                "needYield" : 0.0, 
                                                "saveState" : 7234.0, 
                                                "restoreState" : 7234.0, 
                                                "isEOF" : 1.0, 
                                                "invalidates" : 0.0, 
                                                "indexPrefix" : {

                                                }, 
                                                "indexName" : "name_text_cname_text", 
                                                "parsedTextQuery" : {
                                                    "terms" : [
                                                        "sam"
                                                    ], 
                                                    "negatedTerms" : [

                                                    ], 
                                                    "phrases" : [

                                                    ], 
                                                    "negatedPhrases" : [

                                                    ]
                                                }, 
                                                "textIndexVersion" : 3.0, 
                                                "inputStage" : {
                                                    "stage" : "TEXT_MATCH", 
                                                    "nReturned" : 9.0, 
                                                    "executionTimeMillisEstimate" : 0.0, 
                                                    "works" : 21.0, 
                                                    "advanced" : 9.0, 
                                                    "needTime" : 11.0, 
                                                    "needYield" : 0.0, 
                                                    "saveState" : 7234.0, 
                                                    "restoreState" : 7234.0, 
                                                    "isEOF" : 1.0, 
                                                    "invalidates" : 0.0, 
                                                    "docsRejected" : 0.0, 
                                                    "inputStage" : {
                                                        "stage" : "TEXT_OR", 
                                                        "nReturned" : 9.0, 
                                                        "executionTimeMillisEstimate" : 0.0, 
                                                        "works" : 21.0, 
                                                        "advanced" : 9.0, 
                                                        "needTime" : 11.0, 
                                                        "needYield" : 0.0, 
                                                        "saveState" : 7234.0, 
                                                        "restoreState" : 7234.0, 
                                                        "isEOF" : 1.0, 
                                                        "invalidates" : 0.0, 
                                                        "docsExamined" : 9.0, 
                                                        "inputStage" : {
                                                            "stage" : "IXSCAN", 
                                                            "nReturned" : 9.0, 
                                                            "executionTimeMillisEstimate" : 0.0, 
                                                            "works" : 10.0, 
                                                            "advanced" : 9.0, 
                                                            "needTime" : 0.0, 
                                                            "needYield" : 0.0, 
                                                            "saveState" : 7234.0, 
                                                            "restoreState" : 7234.0, 
                                                            "isEOF" : 1.0, 
                                                            "invalidates" : 0.0, 
                                                            "keyPattern" : {
                                                                "_fts" : "text", 
                                                                "_ftsx" : 1.0
                                                            }, 
                                                            "indexName" : "name_text_cname_text", 
                                                            "isMultiKey" : true, 
                                                            "isUnique" : false, 
                                                            "isSparse" : false, 
                                                            "isPartial" : false, 
                                                            "indexVersion" : 2.0, 
                                                            "direction" : "backward", 
                                                            "indexBounds" : {

                                                            }, 
                                                            "keysExamined" : 9.0, 
                                                            "seeks" : 1.0, 
                                                            "dupsTested" : 9.0, 
                                                            "dupsDropped" : 0.0, 
                                                            "seenInvalidated" : 0.0
                                                        }
                                                    }
                                                }
                                            }, 
                                            {
                                                "stage" : "IXSCAN", 
                                                "filter" : {
                                                    "$or" : [
                                                        {
                                                            "cname" : {
                                                                "$regex" : "Sam", 
                                                                "$options" : "i"
                                                            }
                                                        }
                                                    ]
                                                }, 
                                                "nReturned" : 193.0, 
                                                "executionTimeMillisEstimate" : 357.0, 
                                                "works" : 460693.0, 
                                                "advanced" : 193.0, 
                                                "needTime" : 460499.0, 
                                                "needYield" : 0.0, 
                                                "saveState" : 7234.0, 
                                                "restoreState" : 7234.0, 
                                                "isEOF" : 1.0, 
                                                "invalidates" : 0.0, 
                                                "keyPattern" : {
                                                    "cname" : 1.0
                                                }, 
                                                "indexName" : "cname_1", 
                                                "isMultiKey" : false, 
                                                "multiKeyPaths" : {
                                                    "cname" : [

                                                    ]
                                                }, 
                                                "isUnique" : false, 
                                                "isSparse" : false, 
                                                "isPartial" : false, 
                                                "indexVersion" : 2.0, 
                                                "direction" : "forward", 
                                                "indexBounds" : {
                                                    "cname" : [
                                                        "[\"\", {})", 
                                                        "[/Sam/i, /Sam/i]"
                                                    ]
                                                }, 
                                                "keysExamined" : 460692.0, 
                                                "seeks" : 1.0, 
                                                "dupsTested" : 0.0, 
                                                "dupsDropped" : 0.0, 
                                                "seenInvalidated" : 0.0
                                            }, 
                                            {
                                                "stage" : "IXSCAN", 
                                                "filter" : {
                                                    "$or" : [
                                                        {
                                                            "name" : {
                                                                "$regex" : "Sam", 
                                                                "$options" : "i"
                                                            }
                                                        }
                                                    ]
                                                }, 
                                                "nReturned" : 197.0, 
                                                "executionTimeMillisEstimate" : 318.0, 
                                                "works" : 460724.0, 
                                                "advanced" : 197.0, 
                                                "needTime" : 460526.0, 
                                                "needYield" : 0.0, 
                                                "saveState" : 7234.0, 
                                                "restoreState" : 7234.0, 
                                                "isEOF" : 1.0, 
                                                "invalidates" : 0.0, 
                                                "keyPattern" : {
                                                    "name" : 1.0
                                                }, 
                                                "indexName" : "name_1", 
                                                "isMultiKey" : false, 
                                                "multiKeyPaths" : {
                                                    "name" : [

                                                    ]
                                                }, 
                                                "isUnique" : false, 
                                                "isSparse" : false, 
                                                "isPartial" : false, 
                                                "indexVersion" : 2.0, 
                                                "direction" : "forward", 
                                                "indexBounds" : {
                                                    "name" : [
                                                        "[\"\", {})", 
                                                        "[/Sam/i, /Sam/i]"
                                                    ]
                                                }, 
                                                "keysExamined" : 460723.0, 
                                                "seeks" : 1.0, 
                                                "dupsTested" : 0.0, 
                                                "dupsDropped" : 0.0, 
                                                "seenInvalidated" : 0.0
                                            }
                                        ]
                                    }
                                }
                            }
                        }
                    }, 
                    "allPlansExecution" : [

                    ]
                }

temp Collection 总共有 4.60 lkhs 文档和检查的键比文档更多。请告诉我如何优化此查询,以便我可以使用全文和部分搜索?

【问题讨论】:

  • @AlexBlex 你能解释一下为什么 totalKeysExamined 比文件总数还多吗?
  • @AlexBlex,有什么办法可以优化上述查询,因为我必须更快地搜索?
  • @AlexBlex,文档:{“_id”:5011.0,“cname”:“samyselvik”,“name”:“Samy Sam”,“imgname”:“nrwi4769731443194380996.jpg”,“followers” : 1.0, "status" : 1.0, "createdat" : 1443194421532.0 } 这种类型的文件。 “executionStats”:{“executionSuccess”:真,“nReturned”:363.0,“executionTimeMillis”:894.0,“totalKeysExamined”:921424.0,“totalDocsExamined”:372.0}
  • @AlexBlex,问题已使用文档类型和 executionStats 进行编辑。

标签: mongodb indexing full-text-search query-optimization partial


【解决方案1】:

查询阶段的简要说明:

  • name 上的正则表达式需要 0.3 秒,检查 460k 键,并返回 197 个文档
  • cname 上的正则表达式需要 0.4 秒,检查 460k 键,并返回 193 个文档
  • 全文检查 9 个键并立即返回 9 个文档

所有 3 个or 条件的交集在 0.7 秒内总共返回 363 个文档。它检查了 460k + 460k = 920k 个键。

以下阶段读取文档、应用状态过滤器、在内存中对结果进行排序和预测响应每个阶段花费的时间不到 50 毫秒,总时间为 0.9 秒,几乎没有优化空间。

假设你不能没有正则表达式。首先检查.hint("name_text_cname_text") 是否更快。有时正则表达式使用 collscan https://docs.mongodb.com/manual/reference/operator/query/regex/#index-use 更有效。

接下来,您可以通过将cnamename 组合到一个字段中来优化数据,以减少正则表达式搜索的次数:

{ 
   "_id" : 5011.0, 
   "cname" : "samyselvik", 
   "name" : "Samy Sam", 
   "search_name" : "samyselvik Samy Sam",
   "imgname" : "nrwi4769731443194380996.jpg", 
   "followers" : 1.0, 
   "status" : 1.0, 
   "createdat" : 1443194421532.0
}

您需要更新一次所有文档并更新您的应用程序,以确保未来的文档将具有有效的search_name 字段。

查询将是:

db.temp.find( 
    { $and : [
        { "status" : { "$in" : [1, 2]} },
        { $or: [ 
            { $text: { $search: "school" } },
            { search_name : /school/i}
        ] }  
    ] },
    { cname:1, name:1, followers:1, status :1, score: { $meta: "textScore" } } 
).sort( { score: { $meta: "textScore" }, status :-1, followers :-1 } )

它应该节省大约 0.2 秒,所以这里没有魔法。正则表达式很昂贵,但在某些情况下是不可避免的。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2017-12-03
    • 1970-01-01
    • 2018-10-26
    • 1970-01-01
    • 2018-04-24
    • 2014-11-24
    • 1970-01-01
    相关资源
    最近更新 更多