【发布时间】:2018-02-11 23:49:08
【问题描述】:
我正在使用 Elasticsearch v5.3.2
我有以下映射:
{
"mappings":{
"info":{
"_all":{
"enabled": false
},
"properties":{
"info":{
"properties":{
"email":{
"doc_values":"false",
"fields":{
"ngram":{
"analyzer":"custom_nGram_analyzer",
"type":"text"
}
},
"type":"keyword"
}
}
}
}
}
},
"settings":{
"analysis":{
"analyzer":{
"custom_nGram_analyzer":{
"filter":[
"lowercase",
"asciifolding",
"custom_nGram_filter"
],
"tokenizer":"whitespace",
"type":"custom"
}
},
"filter":{
"custom_nGram_filter":{
"max_gram":16,
"min_gram":3,
"type":"ngram"
}
}
}
}
}
当我执行以下查询时,我在文档分数方面看到了非常奇怪的结果:
GET /info_2017_08/info/_search
{
"query": {
"multi_match": {
"query": "hotmail",
"fields": [
"info.email.ngram"
]
}
}
}
它带来了以下结果:
"hits": {
"total": 3,
"max_score": 1.3834574,
"hits": [
{
"_index": "info_2017_08",
"_type": "info",
"_id": "AV4uQnCjzNcTF2GMY730",
"_score": 1.3834574,
"_source": {
"info": {
"email": "pv53p8vg@gmail.com"
}
}
},
{
"_index": "info_2017_08",
"_type": "info",
"_id": "AV4uQm93zNcTF2GMY73x",
"_score": 0.3967861,
"_source": {
"info": {
"email": "-vb6sbw54@hotmail.com"
}
}
},
{
"_index": "info_2017_08",
"_type": "info",
"_id": "AV4uQmYbzNcTF2GMY73P",
"_score": 0.36409757,
"_source": {
"info": {
"email": "985pu4c.r02a@gmail.com"
}
}
}
]
}
现在注意分数。如果第一个结果是 ...@gmail.com 而第二个结果是 ...@hotmail.com,如果我搜索了“hotmail”一词,为什么第一个结果比第二个结果得分更高?
第二个应该用ngram“mail”和“hotmail”匹配查询,而第一个只会用ngram“mail”匹配查询,那么是什么原因导致这样的结果呢?
提前致谢。
【问题讨论】:
标签: sorting elasticsearch n-gram scoring