【问题标题】:Mapping search analyzer (with apostrophes) not working映射搜索分析器(带撇号)不起作用
【发布时间】:2018-05-03 01:50:14
【问题描述】:

本题基于https://www.elastic.co/guide/en/elasticsearch/guide/current/char-filters.html的“整理标点符号”部分

具体是这样的:

  "char_filter": { 
    "quotes": {
      "type": "mapping",
      "mappings": [ 
        "\\u0091=>\\u0027",
        "\\u0092=>\\u0027",
        "\\u2018=>\\u0027",
        "\\u2019=>\\u0027",
        "\\u201B=>\\u0027"
      ]
    }

会将“奇怪”的撇号变成正常的撇号。

但它似乎不起作用。

我创建了这个索引:

{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 1,
      "analysis": {
        "char_filter": {
          "char_filter_quotes": {
            "type": "mapping",
            "mappings": [
              "\\u0091=>\\u0027",
              "\\u0092=>\\u0027",
              "\\u2018=>\\u0027",
              "\\u2019=>\\u0027",
              "\\u201B=>\\u0027"
            ]
          }
        },
        "analyzer": {
          "analyzer_Text": {
            "type": "standard",
            "char_filter": [ "char_filter_quotes" ]
          }
        }
      }
    }
  },
  "mappings": {
    "_doc": {
      "properties": {
        "Text": {
          "type": "text",
          "analyzer": "analyzer_Text",
          "search_analyzer": "analyzer_Text",
          "term_vector": "with_positions_offsets"
        }
      }
    }
  }
}

添加此文档:

{
  "Text": "Fred's Jim‘s Pete’s Mark‘s"
}

运行此搜索并获得成功(在“Fred's”上突出显示“Fred's”):

{
    "query":
    {
        "match":
        {
            "Text": "Fred's"
        }
    },
    "highlight":
    {
        "fragment_size": 200,
        "pre_tags": [ "<span class='search-hit'>" ],
        "post_tags": [ "</span>" ],
        "fields": { "Text": { "type": "fvh" } }
    }
}

如果我像这样更改上述搜索:

    "Text": "Fred‘s"

我没有命中。为什么不?我认为 search_analyzer 会将“Fred's”变成应该命中的“Fred's”。另外,如果我搜索

    "Text": "Mark's"

我什么也得不到

    "Text": "Mark‘s"

确实命中。练习的重点是保留撇号,但要考虑到这样一个事实,即偶尔会出现非标准撇号滑过但仍会受到打击的事实。

如果我在http://127.0.0.1:9200/esidx_json_gs_entry/_analyze 分析这个问题,那就更令人困惑了:

{
    "char_filter": [ "char_filter_quotes" ],
    "tokenizer" : "standard",
    "filter" : [ "lowercase" ],
    "text" : "Fred's Jim‘s Pete’s Mark‛s"
}

我得到了我所期望的:

{
    "tokens": [
        {
            "token": "fred's",
            "start_offset": 0,
            "end_offset": 6,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "jim's",
            "start_offset": 7,
            "end_offset": 12,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "pete's",
            "start_offset": 13,
            "end_offset": 19,
            "type": "<ALPHANUM>",
            "position": 2
        },
        {
            "token": "mark's",
            "start_offset": 20,
            "end_offset": 26,
            "type": "<ALPHANUM>",
            "position": 3
        }
    ]
}

在搜索中,搜索分析器似乎什么都不做。我错过了什么?

TVMIA,

Adam(编辑——是的,我知道说“谢谢”是“无稽之谈”,但我希望保持礼貌,所以请留下。)

【问题讨论】:

    标签: elasticsearch


    【解决方案1】:

    您的分析仪有一个小错误。应该是

    "tokenizer": "standard"
    

    没有

    "type": "standard"
    

    此外,一旦您为文档编制索引,您可以使用 _termvectors 检查实际术语 因此,在您的示例中,您可以在

    上执行 GET
    http://127.0.0.1:9200/esidx_json_gs_entry/_doc/1/_termvectors 
    

    【讨论】:

    • 就是这样 - 向您展示您需要多么小心。谢谢,杰。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-08-31
    • 1970-01-01
    • 1970-01-01
    • 2020-11-02
    • 1970-01-01
    相关资源
    最近更新 更多