【问题标题】:Why elastic search find case insensitive为什么弹性搜索不区分大小写
【发布时间】:2017-05-06 15:52:27
【问题描述】:

我有这个索引:

"analysis" : { "filter" : { "meeteor_ngram" : { "type" : "nGram", "min_gram" : "2", "max_gram" : "15" } }, "analyzer" : { "meeteor" : { "filter" : [ "meeteor_ngram" ], "tokenizer" : "standard" } } },

还有这份文件:

{ "_index" : "test_global_search", "_type" : "meeting", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "name" : "LightBulb Innovation", "purpose" : "The others should listen the Innovators and also improve the current process.", "location" : "Projector should be set up.", "meeting_notes" : [ { "meeting_note_text" : "The immovator proposed to change the Bulb to Led." } ], "agenda_items" : [ { "text" : "Discuss The Lightning" } ] } }

尽管我没有进行小写过滤或标记化,但这两个查询都返回了文档:

curl -XGET 'localhost:9200/global_search/meeting/_search?pretty' -H 'Content-Type: application/json' -d'
{
    "query": {
        "match": {
            "name": "lightbulb"
        }
    }
}
'

curl -XGET 'localhost:9200/global_search/meeting/_search?pretty' -H 'Content-Type: application/json' -d'
{
    "query": {
        "match": {
            "name": "Lightbulb"
        }
    }
}
'

这是映射:

→ curl -XGET 'localhost:9200/global_search/_mapping/meeting?pretty'
{
  "global_search" : {
    "mappings" : {
      "meeting" : {
        "properties" : {
          "agenda_items" : {
            "properties" : {
              "text" : {
                "type" : "text",
                "analyzer" : "meeteor"
              }
            }
          },
          "location" : {
            "type" : "text",
            "analyzer" : "meeteor"
          },
          "meeting_notes" : {
            "properties" : {
              "meeting_note_text" : {
                "type" : "text",
                "analyzer" : "meeteor"
              }
            }
          },
          "name" : {
            "type" : "text",
            "analyzer" : "meeteor"
          },
          "purpose" : {
            "type" : "text",
            "analyzer" : "meeteor"
          }
        }
      }
    }
  }
}

【问题讨论】:

  • 你的映射在哪里?
  • 我添加了@RoiHatam
  • @Boti 上面的文档是哪个索引的?是 test_global_search 还是 global_search ?两个索引是否具有相同的映射?

标签: elasticsearch


【解决方案1】:

由于您创建了 custom analyzerLightBulblightBulb 都返回了您的文档。

检查您的分析器如何标记您的数据。

GET global_search/_analyze?analyzer=meeteor
{
   "text" : "LightBulb Innovation"
}

您将看到以下输出:

{
 "tokens": [
  {
     "token": "Li",
     "start_offset": 0,
     "end_offset": 9,
     "type": "word",
     "position": 0
  },
  {
     "token": "Lig",
     "start_offset": 0,
     "end_offset": 9,
     "type": "word",
     "position": 0
  },
  {
     "token": "Ligh",
     "start_offset": 0,
     "end_offset": 9,
     "type": "word",
     "position": 0
  },
  {
     "token": "Light",
     "start_offset": 0,
     "end_offset": 9,
     "type": "word",
     "position": 0
  },
 .... other terms starting from Light

   {
     "token": "ig",      ======> tokens below this get matched when you run your query
     "start_offset": 0,
     "end_offset": 9,
     "type": "word",
     "position": 0
  },
  {
     "token": "igh",
     "start_offset": 0,
     "end_offset": 9,
     "type": "word",
     "position": 0
  },
  {
     "token": "ight",
     "start_offset": 0,
     "end_offset": 9,
     "type": "word",
     "position": 0
  },
  .... other tokens.

现在,当您运行match 查询时,custom analyzer 会以上述方式对您搜索的文本进行操作和标记。并且像'ig' , 'igh' 和更多的令牌得到匹配。这就是为什么match 似乎不起作用的原因。

term 查询的情况下,没有搜索分析器起作用。它按原样搜索该术语。如果您搜索 LightBulb ,它将在令牌中找到。但是找不到lightBulb

希望这可以澄清您的问题。

研究termmatch

【解决方案2】:

请将"index" : "not_analyzed" 添加到您的name 字段

"name" : {
      "type" : "keyword",
      "index" : true
}

【讨论】:

  • 我得到了这个:[400] {"error":{"root_cause":[{"type":"mapper_parsing_exception","re​​ason":"无法解析映射 [会议]: [ string] 类型在 5.0 中被删除并且自动升级失败,因为自动升级不支持参数 [analyzer]。您现在应该使用 [text] 或 [keyword] 字段代替字段 [name]"}],"type" :"mapper_parsing_exception","re​​ason":"解析映射失败[meeting]:[string]类型在5.0中被移除,自动升级失败,因为参数[analyzer]不支持自动升级......
  • @Boti 很抱歉,我将映射更新到了新版本。 keyword 代替 stringtrue 代替 not_analyzed
  • 它仍然可以找到“灯泡”和“灯泡”两种方式。 + 使用关键字我将无法搜索“当您键入时”...所以我需要自定义分析器。我仍然不明白为什么不区分大小写。
  • @Boti 您需要删除索引test_global_search 才能设置新映射。你这样做了吗?
  • 那我没主意了
猜你喜欢
  • 2016-08-03
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-01-31
  • 1970-01-01
  • 1970-01-01
  • 2010-09-15
  • 2013-09-26
相关资源
最近更新 更多