【问题标题】:Elasticsearch multi_match query not working with synonyms and cross_fieldsElasticsearch multi_match 查询不适用于同义词和 cross_fields
【发布时间】:2017-07-18 18:42:59
【问题描述】:

cross_fiels 类型和同义词的 Elasticsearch 多重匹配查询未按预期工作。

我有以下配置:

{
    "my_index": {
        "mappings": {
            "my_mapping": {
                "properties": {
                    "@timestamp": {
                        "type": "date"
                    },
                    "@version": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "field1": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "field2": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    }
        },
        "settings": {
            "index": {
                "analysis": {
                    "filter": {
                        "my_synonym_filter": {
                            "type": "synonym",
                            "synonyms": [
                                "matthew,matt,matty",
                                "thomas,tom,thom,tommy"
                            ]
                        }
                    },
                    "analyzer": {
                        "my_synonyms": {
                            "filter": [
                                "lowercase",
                                "my_synonym_filter"
                            ],
                            "tokenizer": "standard"
                        }
                    }
                }
            }
        }
    }
}

还有以下查询:

{
    "query":{  
        "bool":{  
            "should":[  
               {  
                  "multi_match":{  
                     "fields":[  
                        "field1^8",
                        "field2^2"
                     ],
                     "query":"Matt And Tom Oldfield",
                     "type":"cross_fields",
                     "analyzer": "my_synonyms"
                  }
               }
            ]
        }
     }
 }

但是当我执行查询时,它并没有将同义词扩展到每个字段,所以如果我分析查询,解释如下:

(Synonym(field1:matt field1:matthew field1:matty) blended(terms:[field1:and^8.0, field2:and^2.0]) Synonym(field1:thom field1:thomas field1:tom field1:tommy) blended(terms:[field1:oldfield^8.0, field2:oldfield^2.0]))

因此,如果我在 field1 中有“Tom Oldfield”,在 field2 中有“Matt Oldfield”,则查询与该结果不匹配,因为您可以看到它仅扩展了同义词,但对于第一个字段 (field1) 而不是另一个。

如果我从查询中删除分析器,那么它将匹配field1中带有“Tom Oldfield”和field2中带有“Matt Oldfield”的文档,查询说明如下:

(blended(terms:[field1:matt^8.0, field2:matt^2.0]) blended(terms:[field1:and^8.0, field2:and^2.0]) blended(terms:[field1:tom^8.0, field2:tom^2.0]) blended(terms:[field1:oldfield^8.0, field2:oldfield^2.0]))

有没有办法让同义词扩展到每个领域?

【问题讨论】:

  • 您的配置示例中存在问题 - field1 重复
  • 对不起,我刚刚修好了。

标签: elasticsearch


【解决方案1】:

我无法在弹性 5.5.0 的环境中重现您的问题。 这是我的MVCE 设置:

{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "my_synonym_filter": {
            "type": "synonym",
            "synonyms": [
              "matthew,matt,matty",
              "thomas,tom,thom,tommy"
            ]
          }
        },
        "analyzer": {
          "my_synonyms": {
            "filter": [
              "lowercase",
              "my_synonym_filter"
            ],
            "tokenizer": "standard"
          }
        }
      }
    }
  },
  "mappings": {
    "my_mapping": {
      "properties": {
        "field1": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "field2": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}

以下文档已编入索引:

{ "field1": "Tom Oldfield", "field2": "Matt Oldfield"}

在提供的查询上,ES 创建以下Lucene query

((field1:matt)^8.0 | (field1:matthew)^8.0 | (field1:matty)^8.0 | (field2:matt)^2.0 | (field2:matthew)^2.0 | (field2:matty)^2.0) 
((field1:and)^8.0 | (field2:and)^2.0) 
((field1:tom)^8.0 | (field1:thomas)^8.0 | (field1:thom)^8.0 | (field1:tommy)^8.0 | (field2:tom)^2.0 | (field2:thomas)^2.0 | (field2:thom)^2.0 | (field2:tommy)^2.0) 
((field1:oldfield)^8.0 | (field2:oldfield)^2.0))

为每个字段扩展同义词。

【讨论】:

  • 你是对的。如果我在笔记本电脑上的 ES 上尝试它,它可以工作,但如果我在 AWS Elasticsearch 服务上尝试它,它会产生我之前输入的内容。您知道为什么会发生这种情况吗?
  • @SofiaBraun 能否提供 ES 版本?
  • 我使用的是 ES 5.1
  • 我用 ES 5.3(AWS 提供的最新版本)试了一下,效果很好。谢谢!
猜你喜欢
  • 2017-09-15
  • 1970-01-01
  • 1970-01-01
  • 2020-07-31
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多