【发布时间】:2020-02-16 20:28:50
【问题描述】:
我目前正在使用这个 elasticsearch DSL 查询:
{
"_source": [
"title",
"bench",
"id_",
"court",
"date"
],
"size": 15,
"from": 0,
"query": {
"bool": {
"must": {
"multi_match": {
"query": "i r coelho",
"fields": [
"title",
"content"
]
}
},
"filter": [],
"should": {
"multi_match": {
"query": "i r coelho",
"fields": [
"title.standard^16",
"content.standard"
]
}
}
}
},
"highlight": {
"pre_tags": [
"<tag1>"
],
"post_tags": [
"</tag1>"
],
"fields": {
"content": {}
}
}
}
这就是正在发生的事情。如果我搜索 I.r coelho它会返回正确的结果。但是,如果我搜索 I R coelho(没有句点),那么它会返回不同的结果。我该如何防止这种情况发生?即使有额外的句点、空格、逗号等,我也希望搜索行为相同。
映射
{
"courts_2": {
"mappings": {
"properties": {
"author": {
"type": "text",
"analyzer": "my_analyzer"
},
"bench": {
"type": "text",
"analyzer": "my_analyzer"
},
"citation": {
"type": "text"
},
"content": {
"type": "text",
"fields": {
"standard": {
"type": "text"
}
},
"analyzer": "my_analyzer"
},
"court": {
"type": "text"
},
"date": {
"type": "text"
},
"id_": {
"type": "text"
},
"title": {
"type": "text",
"fields": {
"standard": {
"type": "text"
}
},
"analyzer": "my_analyzer"
},
"verdict": {
"type": "text"
}
}
}
}
}
设置:
{
"courts_2": {
"settings": {
"index": {
"highlight": {
"max_analyzed_offset": "19000000"
},
"number_of_shards": "5",
"provided_name": "courts_2",
"creation_date": "1581094116992",
"analysis": {
"filter": {
"my_metaphone": {
"replace": "true",
"type": "phonetic",
"encoder": "metaphone"
}
},
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase",
"my_metaphone"
],
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "MZSecLIVQy6jiI6YmqOGLg",
"version": {
"created": "7010199"
}
}
}
}
}
编辑
以下是I.R coelho 的结果来自my analyzer - {
"tokens": [
{
"token": "IR",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "KLH",
"start_offset": 4,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
}
]
}
标准分析仪:
{
"tokens": [
{
"token": "i.r",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "coelho",
"start_offset": 4,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
}
]
}
【问题讨论】:
-
我们需要知道你的索引的映射,你能把它贴出来吗?
-
另外,与您的问题无关,在
must和should布尔块中都有multi_match查询有什么意义? -
嗨!我在映射中进行了编辑。至于逻辑,我真的不知道。我从其他人那里继承了此代码,以实现快速周转项目。如果有错误,请随时纠正。 @glenacota
-
您正在使用名为
my_analyzer的自定义分析器。你能发布它的定义吗? (GET <your_index>/_settings) -
我很抱歉没有包括在内。我已经编辑了它。@glenacota
标签: elasticsearch elasticsearch-plugin