【发布时间】:2020-02-26 15:39:03
【问题描述】:
我有一个 Elasticsearch 索引,我在其中设置了 "max_ngram_diff": 50,但不知何故,它似乎只适用于 edge_ngram 标记器,但不适用于 ngram 标记器。
我已经针对同一个 URL http://localhost:9201/index-name/_analyze 发出了这两个请求:
请求 1
{
"tokenizer":
{
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
},
"text": "1234567890;abcdefghijklmn;"
}
请求 2
{
"tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit"
]
},
"text": "1234567890;abcdefghijklmn;"
}
第一个请求返回预期结果:
{
"tokens": [
{
"token": "123",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": "1234",
"start_offset": 0,
"end_offset": 4,
"type": "word",
"position": 1
},
{
"token": "12345",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 2
},
{
"token": "123456",
"start_offset": 0,
"end_offset": 6,
"type": "word",
"position": 3
},
// more tokens
]
}
但是第二个请求只返回这个:
{
"error": {
"root_cause": [
{
"type": "remote_transport_exception",
"reason": "[ffe18f1a89e6][172.18.0.3:9300][indices:admin/analyze[s]]"
}
],
"type": "illegal_argument_exception",
"reason": "The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [17]. This limit can be set by changing the [index.max_ngram_diff] index level setting."
},
"status": 400
}
发生了什么,使用edge_ngram 标记器的第一个请求在max_gram 和min_gram 之间的差异可能比1 更大,但使用ngram 标记器的第二个请求不能?
这是我的映射:
{
"settings": {
"index": {
"max_ngram_diff": 50,
// further settings
}
}
}
使用的Elastisearch版本是7.2.0
感谢您的帮助!
【问题讨论】:
-
我刚刚将您的示例与 ES 7.5 一起使用,它对我和 IMO 来说工作得非常好,这种行为没有中断或最近发生变化
标签: elasticsearch tokenize n-gram elasticsearch-analyzers