【问题标题】:ElasticSearch doesn't recognise numbersElasticSearch 无法识别数字
【发布时间】:2018-11-30 15:41:39
【问题描述】:

我使用此配置进行搜索和映射:

PUT :9200/订阅者

{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "lowercase"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 10,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
         "id": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "name": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
         "contact_number": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

但是当我添加新对象时:

POST :9200/subscribers/doc/?pretty

{
  "id": "1421997",
  "name": "John 333 Martin",
  "contact_number":"+43fdsds*543254365"
}

如果我这样搜索多个字段

POST :9200/subscribers/doc/_search

{
    "query": {
        "multi_match": {
            "query": "Joh",
            "fields": [
                "name",
                "id",
                "contact_number"
            ],
            "type": "best_fields"
        }
    }
}

它成功返回"John 333 Martin"。但是当我这样做时:"query": "333""query": "+43fds""query": "14219",它什么也不返回。这很奇怪,因为我也为数字配置了过滤器:

 "token_chars": [
            "letter",
            "digit"
          ]

我应该怎么做才能按所有字段搜索并查看带有数字的结果?


更新:

即使是GET :9200/subscribers/_analyze

{
  "analyzer": "autocomplete",
  "text": "+43fdsds*543254365"
}

显示绝对正确的组合,例如"43""43f""43fd""43fds"。但搜索没有。可能是我的搜索查询不正确?

【问题讨论】:

    标签: elasticsearch


    【解决方案1】:

    您的搜索使用的分析器与用于在倒排索引中创建标记的分析器不同。因为您使用 lowercase 标记器作为 search_analyzer,所以数字被剥离。见下文

    POST _analyze
    {
      "tokenizer": "lowercase",
      "text":     "+43fdsds*543254365"
    }
    

    生产

    {
      "tokens" : [
        {
          "token" : "fdsds",
          "start_offset" : 3,
          "end_offset" : 8,
          "type" : "word",
          "position" : 0
        }
      ]
    }
    

    改为使用 standard 分析器作为您的 search_analyzer,即如下所示修改您的映射,它将按预期工作

    "mappings": {
        "doc": {
          "properties": {
             "id": {
              "type": "text",
              "analyzer": "autocomplete",
              "search_analyzer": "standard"
            },
            "name": {
              "type": "text",
              "analyzer": "autocomplete",
              "search_analyzer": "standard"
            },
             "contact_number": {
              "type": "text",
              "analyzer": "autocomplete",
              "search_analyzer": "standard"
            }
          }
        }
      }
    

    使用standard 分析器

    POST _analyze
    {
      "analyzer": "standard",
      "text":     "+43fdsds*543254365"
    }
    

    生产

    {
      "tokens" : [
        {
          "token" : "43fdsds",
          "start_offset" : 1,
          "end_offset" : 8,
          "type" : "<ALPHANUM>",
          "position" : 0
        },
        {
          "token" : "543254365",
          "start_offset" : 9,
          "end_offset" : 18,
          "type" : "<NUM>",
          "position" : 1
        }
      ]
    }
    

    【讨论】:

    • 我应该如何更改配置,以查看 nGfam 建议的组合:“+43”、“+43f”、“+43fds”等?
    • 查看组合是什么意思?喜欢搜索这些组合吗?
    • 是的,正是我的意思
    • 您应该能够搜索这些组合,但请注意标准标记器在遇到特殊字符时如何拆分标记 *
    猜你喜欢
    • 1970-01-01
    • 2022-01-02
    • 2019-09-09
    • 2019-03-31
    • 2021-03-11
    • 1970-01-01
    • 2021-03-19
    • 1970-01-01
    相关资源
    最近更新 更多