【问题标题】:MongoDB[4.2] $text search not returning expected resultsMongoDB [4.2] $文本搜索未返回预期结果
【发布时间】:2021-01-22 16:17:38
【问题描述】:

我们有作者集合,其中包含所有作者的作者信息。我们使用以下方法创建了文本索引

db.getCollection('contributors').createIndex(
  {
    display_name:"text",
    first_name: "text",
    last_name: "text"      
  },
  {
     weights: {
       display_name: 10,
       first_name: 5,
       last_name:5
     },      
    name: "Contributor_FTS_Index"
  }
)

这是我们的样本数据

{
    "_id" : ObjectId("5eac8232eb5aca201f104bfb"),
    "firebrand_id" : 54529588,
    "agents" : null,
    "created" : ISODate("2020-05-01T20:10:26.762Z"),
    "display_name" : "Grace Octavia",
    "email" : null,
    "estates" : null,
    "first_name" : "Grace",
    "item_type" : "Contributor",
    "last_name" : "Octavia",
    "phone" : null,
    "role" : 1,
    "short_bio" : "GRACE OCTAVIA is the author of unforgettable novels that deal with the trials and tribulations of love, friendship, and what it means to be true to yourself. Her second novel, His First Wife, graced the Essence® bestseller list and also won the Best African-American Fiction Award from RT Book Reviews. A native of Westbury, NY, she now resides in Atlanta, GA, where there is never any shortage of material on heartache and scandal. Grace earned a doctorate in English, Creative Writing at Georgia State University in Atlanta and currently teaches at Spelman College. Visit her online at GraceOctavia.net or follow her on Twitter @GraceOctavia2.",
    "slug" : "grace-octavia",
    "updated" : ISODate("2020-08-05T10:10:27.691Z"),
    "deleted" : false
}

{
    "_id" : ObjectId("5ada44aa2ad4b3e3d0ae3daf"),
    "item_type" : "Contributor",
    "role" : 1,
    "short_bio" : "",
    "firebrand_id" : 41529135,
    "display_name" : "Grace  Octavia",
    "first_name" : "Grace",
    "last_name" : "Octavia",
    "slug" : "grace-octavia",
    "updated" : ISODate("2020-09-22T16:19:57.319Z"),
    "agents" : null,
    "estates" : null,
    "deleted" : false,
    "email" : null,
    "phone" : null
}


{
    "_id" : ObjectId("58e6ee27afbe421347a11834"),
    "item_type" : "Contributor",
    "role" : 1,
    "short_bio" : "Octavia E. Butler (1947–2006) was a bestselling and award-winning author, considered one of the best science fiction writers of her generation. She received both the Hugo and Nebula awards, and in 1995 became the first author of science fiction to receive a MacArthur Fellowship. She was also awarded the prestigious PEN Lifetime Achievement Award in 2000. Her first novel, <i>Patternmaster</i> (1976), was praised both for its imaginative vision and for Butler’s powerful prose, and spawned four prequels, beginning with <i>Mind of My Mind</i> (1977) and finishing with <i>Clay’s Ark</i> (1984).<br /><br /> Although the Patternist series established Butler among the science fiction elite, it was <i>Kindred</i> (1979), a story of a black woman who travels back in time to the antebellum South, that brought her mainstream success. In 1985, Butler won Nebula and Hugo awards for the novella “Bloodchild,” and in 1987 she published <i>Dawn</i>, the first novel of the Xenogenesis trilogy, about a race of aliens who visit earth to save humanity from itself. <i>Fledgling</i> (2005) was Butler’s final novel. She died at her home in 2006.",
    "firebrand_id" : 11532005,
    "display_name" : "Octavia E. Butler",
    "first_name" : "Octavia",
    "last_name" : "Butler",
    "slug" : "octavia-e-butler",
    "updated" : ISODate("2020-09-23T04:06:18.857Z"),
    "image" : "https://s3.amazonaws.com/orim-book-contributors/11532005-book-contributor.jpg",
    "agents" : [ 
        {
            "name" : "Heifetz, Merrilee",
            "primaryemail" : "mheifetz@writershouse.com",
            "primaryphone" : "212-685-2605"
        }
    ],
    "estates" : [ 
        {
            "name" : "Estate of Octavia E. Butler",
            "primaryemail" : "",
            "primaryphone" : ""
        }
    ],
    "deleted" : false,
    "email" : null,
    "phone" : null
}

当我们尝试执行如下操作时;

db.getCollection('contributors').find({ $text: { $search: "oct" }})

它不返回任何文件。但是如果搜索

db.getCollection('contributors').find({ $text: { $search: "octavia" }})

它返回所有文档。

我们的要求是根据用户输入的搜索词给出搜索结果。所以可以是oc、oct、octav

【问题讨论】:

    标签: mongodb


    【解决方案1】:

    使用这种类型的搜索的流行方式而不是 $text 所以试试这样,

    db.contributors.find({
      "$or": [
        {
          display_name: {
            $regex: "oct",
            $options: "i"
          }
        }
     // add more fields objects same as above 
    ]
    
    });
    

    【讨论】:

      【解决方案2】:

      您选择了错误的工具。 mongo 中的文本搜索使用整个单词。在https://docs.mongodb.com/manual/core/index-text/#tokenization-delimiters

      上阅读有关 mongo 标记器的更多信息

      部分词索引需要 ngram 分词器。它在功能齐全的文本引擎中可用。例如。基于Apache Lucene:ElasticSearch、Solr、Mongo Atlas等

      如果你的数据库比较小,权重不是必需的,你可以使用正则表达式:

      db.contributors.find({
        "$or": [
          {
            displayname: {
              $regex: "oct",
              $options: "i"
            }
          },
          {
            first_name: {
              $regex: "oct",
              $options: "i"
            }
          },
          {
            last_mname: {
              $regex: "oct",
              $options: "i"
            }
          }
        ]
      })
      

      【讨论】:

      • 我们有大约 50k 条记录,未来还会进一步增长。以上选项肯定会起作用,但不确定性能。另外,我们创建的文本索引有什么问题?我们将文本索引用于其他集合,它工作得非常好,但有些不适合这个集合
      • 索引没有任何问题,只是没有按您期望的方式工作。请阅读我在答案中提到的文档,以了解 mongodb 全文搜索实现可以做什么以及不能做什么。如果您有一个自动完成方案适用的设置,我很想了解详细信息。 50k 个文档足够小,可以在不杀死服务器的情况下使用正则表达式。
      猜你喜欢
      • 2020-05-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-07-06
      • 2020-05-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多