【发布时间】:2018-05-03 01:50:14
【问题描述】:
本题基于https://www.elastic.co/guide/en/elasticsearch/guide/current/char-filters.html的“整理标点符号”部分
具体是这样的:
"char_filter": {
"quotes": {
"type": "mapping",
"mappings": [
"\\u0091=>\\u0027",
"\\u0092=>\\u0027",
"\\u2018=>\\u0027",
"\\u2019=>\\u0027",
"\\u201B=>\\u0027"
]
}
会将“奇怪”的撇号变成正常的撇号。
但它似乎不起作用。
我创建了这个索引:
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 1,
"analysis": {
"char_filter": {
"char_filter_quotes": {
"type": "mapping",
"mappings": [
"\\u0091=>\\u0027",
"\\u0092=>\\u0027",
"\\u2018=>\\u0027",
"\\u2019=>\\u0027",
"\\u201B=>\\u0027"
]
}
},
"analyzer": {
"analyzer_Text": {
"type": "standard",
"char_filter": [ "char_filter_quotes" ]
}
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"Text": {
"type": "text",
"analyzer": "analyzer_Text",
"search_analyzer": "analyzer_Text",
"term_vector": "with_positions_offsets"
}
}
}
}
}
添加此文档:
{
"Text": "Fred's Jim‘s Pete’s Mark‘s"
}
运行此搜索并获得成功(在“Fred's”上突出显示“Fred's”):
{
"query":
{
"match":
{
"Text": "Fred's"
}
},
"highlight":
{
"fragment_size": 200,
"pre_tags": [ "<span class='search-hit'>" ],
"post_tags": [ "</span>" ],
"fields": { "Text": { "type": "fvh" } }
}
}
如果我像这样更改上述搜索:
"Text": "Fred‘s"
我没有命中。为什么不?我认为 search_analyzer 会将“Fred's”变成应该命中的“Fred's”。另外,如果我搜索
"Text": "Mark's"
我什么也得不到
"Text": "Mark‘s"
确实命中。练习的重点是保留撇号,但要考虑到这样一个事实,即偶尔会出现非标准撇号滑过但仍会受到打击的事实。
如果我在http://127.0.0.1:9200/esidx_json_gs_entry/_analyze 分析这个问题,那就更令人困惑了:
{
"char_filter": [ "char_filter_quotes" ],
"tokenizer" : "standard",
"filter" : [ "lowercase" ],
"text" : "Fred's Jim‘s Pete’s Mark‛s"
}
我得到了我所期望的:
{
"tokens": [
{
"token": "fred's",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "jim's",
"start_offset": 7,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "pete's",
"start_offset": 13,
"end_offset": 19,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "mark's",
"start_offset": 20,
"end_offset": 26,
"type": "<ALPHANUM>",
"position": 3
}
]
}
在搜索中,搜索分析器似乎什么都不做。我错过了什么?
TVMIA,
Adam(编辑——是的,我知道说“谢谢”是“无稽之谈”,但我希望保持礼貌,所以请留下。)
【问题讨论】:
标签: elasticsearch