【问题标题】:Some Multi word synonyms are not working in elasticsearch for nested fields一些多词同义词在嵌套字段的弹性搜索中不起作用
【发布时间】:2023-04-05 09:53:02
【问题描述】:

我试图在查询时使用同义词分析器,但没有得到预期的结果。有人可以对此有所了解吗?

这是我的索引映射:

{
  "jobs_user_profile_v2": {
    "mappings": {
      "profile": {
        "_all": {
          "enabled": false
        },
        "_ttl": {
          "enabled": true
        },
        "properties": {

          "rsa": {
            "type": "nested",
            "properties": {
              "answer": {
                "type": "string",
                "index_analyzer": "autocomplete",
                "search_analyzer": "synonym",
                "position_offset_gap": 100
              },
              "answerId": {
                "type": "long"
              },
              "answerOriginal": {
                "type": "string",
                "index": "not_analyzed"
              },
              "createdAt": {
                "type": "long"
              },
              "label": {
                "type": "string",
                "index": "not_analyzed"
              },
              "labelOriginal": {
                "type": "string",
                "index": "not_analyzed"
              },
              "question": {
                "type": "string",
                "index": "not_analyzed"
              },
              "questionId": {
                "type": "long"
              },
              "questionOriginal": {
                "type": "string"
              },
              "source": {
                "type": "integer"
              },
              "updatedAt": {
                "type": "long"
              }
            }
          }

        }
      }
    }
  }
}

要关注的字段是rsa.answer,也就是我要查询的字段。

我的同义词映射:

Beautician,Stylist,Make up artist,Massage therapist,Therapist,Spa,Hair Dresser,Salon,Beauty Parlour,Parlor => Beautician
Carpenter,Wood Worker,Furniture Carpenter => Carpenter
Cashier,Store Manager,Store Incharge,Purchase Executive,Billing Executive,Billing Boy => Cashier
Content Writer,Writer,Translator,Writing,Copywriter,Content Creation,Script Writer,Freelance Writer,Freelance Content Writer => Content Writer

我的搜索查询:

http://{{domain}}/jobs_user_profile_v2/_search

{
  "query": {
      "nested":{
           "path": "rsa",
           "query":{
    "query_string": {
      "query": "hair dresser",
      "fields": ["answer"],
      "analyzer" :"synonym"



    }
    },
     "inner_hits": {
          "explain": true
      }

  }
  },
  "explain" : true,
  "sort" : [ {
    "_score" : { }
  } ]
}

它显示正确的Beautician 和'Cashierprofiles for search queryHair Dresserandbilling Executivebut not showing anything forwood worker => carpenter` case。

我的分析仪结果:

http://{{domain}}/jobs_user_profile_v2/_analyze?analyzer=synonym&text=hair dresser


{
  "tokens": [
    {
      "token": "beautician",
      "start_offset": 0,
      "end_offset": 12,
      "type": "SYNONYM",
      "position": 1
    }
  ]
}

对于wood worker case

http://{{domain}}/jobs_user_profile_v2/_analyze?analyzer=synonym&text=wood worker


{
  "tokens": [
    {
      "token": "carpenter",
      "start_offset": 0,
      "end_offset": 11,
      "type": "SYNONYM",
      "position": 1
    }
  ]
}

在其他一些情况下也不起作用。

我的索引分析器设置:

 "analysis": {
          "filter": {
            "synonym": {
              "ignore_case": "true",
              "type": "synonym",
              "synonyms_path": "synonym.txt"
            },
            "autocomplete_filter": {
              "type": "edge_ngram",
              "min_gram": "3",
              "max_gram": "10"
            }
          },
          "analyzer": {
            "text_en_splitting_search": {
              "type": "custom",
              "filter": [
                "stop",
                "lowercase",
                "porter_stem",
                "word_delimiter"
              ],
              "tokenizer": "whitespace"
            },
            "synonym": {
              "filter": [
                "stop",
                "lowercase",
                "synonym"
              ],
              "type": "custom",
              "tokenizer": "standard"
            },
            "autocomplete": {
              "filter": [
                "lowercase",
                "autocomplete_filter"
              ],
              "type": "custom",
              "tokenizer": "standard"
            },
            "text_en_splitting": {
              "filter": [
                "lowercase",
                "porter_stem",
                "word_delimiter"
              ],
              "type": "custom",
              "tokenizer": "whitespace"
            },
            "text_general": {
              "filter": [
                "lowercase"
              ],
              "type": "custom",
              "tokenizer": "standard"
            },
            "edge_ngram_analyzer": {
              "filter": [
                "lowercase"
              ],
              "type": "custom",
              "tokenizer": "edge_ngram_tokenizer"
            },
            "autocomplete_analyzer": {
              "filter": [
                "lowercase"
              ],
              "tokenizer": "whitespace"
            }
          },
          "tokenizer": {
            "edge_ngram_tokenizer": {
              "token_chars": [
                "letter",
                "digit"
              ],
              "min_gram": "2",
              "type": "edgeNGram",
              "max_gram": "10"
            }
          }
        }

【问题讨论】:

    标签: elasticsearch nested analyzer synonym


    【解决方案1】:

    对于上述情况,multi-match 比查询字符串更理想。 Multi-Match 与查询字符串不同,在分析查询词之前不会对其进行标记。因此,多词同义词可能无法按预期工作。

    例子:

    {
       "query": {
          "nested": {
             "path": "rsa",
             "query": {
                "multi_match": {
                   "query": "wood worker",
                   "fields": [
                      "rsa.answer"
                   ],
                   "type" : "cross_fields",
                   "analyzer": "synonym"
                }
             }
          }
       }
    }
    

    如果出于某种原因您更喜欢查询字符串,那么您需要将整个查询用双引号括起来以确保它没有被标记:

    示例:

    post test/_search
    {
       "query": {
          "nested": {
             "path": "rsa",
             "query": {
                "query_string": {
                   "query": "\"wood worker\"",
                   "fields": [
                      "rsa.answer"
                   ],
                   "analyzer": "synonym"
                }
             }
          }
       }
    }
    

    【讨论】:

    • 谢谢@keety,很有帮助。
    • 如果我们已经将 path 作为 rsa,是否有必要在 fields 中给出 rsa.answer ?
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-09-11
    • 2018-01-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多