【发布时间】:2020-11-17 15:15:47
【问题描述】:
我想使用 ElasticSearch Java API 创建一个查询,它只匹配 (1) 完整的单词和 (2) searchquery 中的所有单词。这是一个例子:
文字:
hello wonderful world
这些应该匹配:
hello
hello wonderful
hello world
wonderful world
hello wonderful world
wonderful
world
这些不应该匹配:
-
hell -
hello fniefsgbsugbs
我为匹配查询尝试了以下参数,但它仍然匹配上面的两个示例。
这是使用 ElasticSearch 7.7.1 Java API 生成查询的代码:
import org.elasticsearch.index.query.QueryBuilders
...
QueryBuilders.matchQuery(field, query)
.autoGenerateSynonymsPhraseQuery(false)
.fuzziness(0)
.prefixLength(0)
.fuzzyTranspositions(false)
.operator(Operator.AND)
.minimumShouldMatch("100%")
这将生成这个查询:
{
"size": 100,
"query": {
"bool": {
"filter": [
{
"match": {
"searchableText": {
"query": "hell",
"operator": "AND",
"fuzziness": "0",
"prefix_length": 0,
"max_expansions": 50,
"minimum_should_match": "100%",
"fuzzy_transpositions": false,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": false,
"boost": 1
}
}
}
]
}
}
}
有人可以帮我找到一个好的解决方案吗?
编辑:以下是设置和映射(我删除了与searchableText 无关的所有内容,以使其尽可能少):
{
"settings": {
"analysis": {
"normalizer": {
"lowercase_normalizer": {
"type": "custom",
"filter": [
"lowercase"
]
}
},
"filter": {
"german_stemmer": {
"type": "stemmer",
"language": "light_german"
},
"ngram_filter": {
"type": "shingle",
"max_shingle_size": 4,
"min_shingle_size": 2,
"output_unigrams": false,
"output_unigrams_if_no_shingles": false
}
},
"analyzer": {
"german": {
"tokenizer": "standard",
"filter": [
"lowercase",
"german_synonyms",
"german_stop",
"german_keywords",
"german_no_stemming",
"german_stemmer"
]
},
"german_ngram": {
"tokenizer": "standard",
"filter": [
"lowercase",
"german_synonyms",
"german_keywords",
"german_no_stemming",
"german_stemmer",
"ngram_filter"
]
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"copy_to": "searchableText",
"analyzer": "german"
},
"name": {
"type": "text",
"copy_to": "searchableText",
"analyzer": "german"
},
"userTags": {
"type": "keyword",
"copy_to": "searchableText",
"normalizer": "lowercase_normalizer"
},
"searchableText": {
"type": "text",
"analyzer": "german",
"fields": {
"ngram": {
"type": "text",
"analyzer": "german_ngram"
}
}
},
"searches": {
"type": "keyword",
"copy_to": "searchableText",
"normalizer": "lowercase_normalizer"
}
}
}
}
编辑 2: 这些是提到的过滤器:
"filter": {
"german_stop": {
"type": "stop",
"stopwords": "_german_"
},
"german_stemmer": {
"type": "stemmer",
"language": "light_german"
},
"ngram_filter": {
"type": "shingle",
"max_shingle_size": 4,
"min_shingle_size": 2,
"output_unigrams": false,
"output_unigrams_if_no_shingles": false
}
}
【问题讨论】:
-
请特别为字段
searchableText添加索引映射以及该字段的任何相关设置。 -
感谢您的回复,我添加了设置。我希望这会有所帮助。
-
@Peter,我尝试了您的映射和示例文档,它可以按照您想要的方式工作,请参阅我的答案以获取更多详细信息。
标签: elasticsearch elasticsearch-java-api elasticsearch-7