ElasticSearch 无法识别数字答案

【问题标题】：ElasticSearch doesn't recognise numbersElasticSearch 无法识别数字
【发布时间】：2018-11-30 15:41:39
【问题描述】：

我使用此配置进行搜索和映射：

PUT :9200/订阅者

{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "lowercase"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 10,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
         "id": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "name": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
         "contact_number": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

但是当我添加新对象时：

POST :9200/subscribers/doc/?pretty

{
  "id": "1421997",
  "name": "John 333 Martin",
  "contact_number":"+43fdsds*543254365"
}

如果我这样搜索多个字段

POST :9200/subscribers/doc/_search

{
    "query": {
        "multi_match": {
            "query": "Joh",
            "fields": [
                "name",
                "id",
                "contact_number"
            ],
            "type": "best_fields"
        }
    }
}

它成功返回"John 333 Martin"。但是当我这样做时："query": "333" 或 "query": "+43fds" 或 "query": "14219"，它什么也不返回。这很奇怪，因为我也为数字配置了过滤器：

 "token_chars": [
            "letter",
            "digit"
          ]

我应该怎么做才能按所有字段搜索并查看带有数字的结果？

更新：

即使是GET :9200/subscribers/_analyze

{
  "analyzer": "autocomplete",
  "text": "+43fdsds*543254365"
}

显示绝对正确的组合，例如"43"、"43f"、"43fd"、"43fds"。但搜索没有。可能是我的搜索查询不正确？

【问题讨论】：

标签： elasticsearch

【解决方案1】：

您的搜索使用的分析器与用于在倒排索引中创建标记的分析器不同。因为您使用 lowercase 标记器作为 search_analyzer，所以数字被剥离。见下文

POST _analyze
{
  "tokenizer": "lowercase",
  "text":     "+43fdsds*543254365"
}

生产

{
  "tokens" : [
    {
      "token" : "fdsds",
      "start_offset" : 3,
      "end_offset" : 8,
      "type" : "word",
      "position" : 0
    }
  ]
}

改为使用 standard 分析器作为您的 search_analyzer，即如下所示修改您的映射，它将按预期工作

"mappings": {
    "doc": {
      "properties": {
         "id": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "standard"
        },
        "name": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "standard"
        },
         "contact_number": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "standard"
        }
      }
    }
  }

使用standard 分析器

POST _analyze
{
  "analyzer": "standard",
  "text":     "+43fdsds*543254365"
}

生产

{
  "tokens" : [
    {
      "token" : "43fdsds",
      "start_offset" : 1,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "543254365",
      "start_offset" : 9,
      "end_offset" : 18,
      "type" : "<NUM>",
      "position" : 1
    }
  ]
}

【讨论】：

我应该如何更改配置，以查看 nGfam 建议的组合：“+43”、“+43f”、“+43fds”等？
查看组合是什么意思？喜欢搜索这些组合吗？
是的，正是我的意思
您应该能够搜索这些组合，但请注意标准标记器在遇到特殊字符时如何拆分标记 *