如何在“文本”字段上进行部分匹配？答案

【问题标题】：How can I do a partial match on a "text" field?如何在“文本”字段上进行部分匹配？
【发布时间】：2020-09-20 07:15:02
【问题描述】：

首先我是 ElasticSearch 的新手。

我正在将带有一堆 LIKE '%' + @searchTerm + '%' 子句的旧 SQL 脚本移植到弹性搜索。

我已经为我的文档编制了索引，并且我有一个类似的属性

"noteNo": {
    "type": "text",
    "fields": {
        "keyword": {
            "type": "keyword",
            "ignore_above": 256
        }
    }
},

此字段包含连续数字，但存储为字符串。

例如

我希望能够搜索“456”并获取与该部分字符串匹配的所有文档，就像 SQL LIKE '%456%' 给出的那样。

我见过通配符查询，但读到有性能问题，所以我现在的超级基本查询是

{
    "query": {
        "multi_match": {
            "query": "350",
            "fields": [
                ... other fields elided 
                "noteNo"
            ]
        }
    }
}

但这没有给我任何回报。

我在 Windows 上运行 ES 7.9.1 并使用 .net NEST 客户端进行索引。我没有改变任何字段的分析。

如何以类似于 SQL LIKE '%' + @searchTerm + '%' 的方式搜索这些子字符串？

【问题讨论】：

ngram 分词器 - elastic.co/guide/en/elasticsearch/reference/current/…。一个例子：stackoverflow.com/questions/61706304/…

标签： elasticsearch

【解决方案1】：

您可以使用 N-gram tokenizer 将文本首先分解为每当遇到指定字符列表中的一个时，然后它发出指定长度的每个单词的 N-gram。

添加一个包含索引数据、映射、搜索查询和结果的工作示例。

索引映射

 {
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "tokenizer": "my_tokenizer"
                }
            },
            "tokenizer": {
                "my_tokenizer": {
                    "type": "ngram",
                    "min_gram": 2,
                    "max_gram": 5,
                    "token_chars": [
                        "letter",
                        "digit"
                    ]
                }
            }
        },
        "max_ngram_diff": 50
    },
    "mappings": {
        "properties": {
            "title": {
                "type": "text",
                "analyzer": "my_analyzer",
                "search_analyzer": "standard"
            }
        }
    }
}

分析 API

POST/_analyze

{
  "analyzer" : "my_analyzer",
  "text" : "34567"
}

生成以下令牌

{
    "tokens": [
        {
            "token": "34",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 0
        },
        {
            "token": "345",
            "start_offset": 0,
            "end_offset": 3,
            "type": "word",
            "position": 1
        },
        {
            "token": "3456",
            "start_offset": 0,
            "end_offset": 4,
            "type": "word",
            "position": 2
        },
        {
            "token": "34567",
            "start_offset": 0,
            "end_offset": 5,
            "type": "word",
            "position": 3
        },
        {
            "token": "45",
            "start_offset": 1,
            "end_offset": 3,
            "type": "word",
            "position": 4
        },
        {
            "token": "456",
            "start_offset": 1,
            "end_offset": 4,
            "type": "word",
            "position": 5
        },
        {
            "token": "4567",
            "start_offset": 1,
            "end_offset": 5,
            "type": "word",
            "position": 6
        },
        {
            "token": "56",
            "start_offset": 2,
            "end_offset": 4,
            "type": "word",
            "position": 7
        },
        {
            "token": "567",
            "start_offset": 2,
            "end_offset": 5,
            "type": "word",
            "position": 8
        },
        {
            "token": "67",
            "start_offset": 3,
            "end_offset": 5,
            "type": "word",
            "position": 9
        }
    ]
}

搜索查询：

{
    "query": {
        "match": {
            "title": "456"
        }
    }
}

搜索结果：

 "hits": [
            {
                "_index": "stof_63976447",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.074107975,
                "_source": {
                    "title": 34567
                }
            },
            {
                "_index": "stof_63976447",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.074107975,
                "_source": {
                    "title": 34568
                }
            },
            {
                "_index": "stof_63976447",
                "_type": "_doc",
                "_id": "3",
                "_score": 0.074107975,
                "_source": {
                    "title": 34569
                }
            },
            {
                "_index": "stof_63976447",
                "_type": "_doc",
                "_id": "4",
                "_score": 0.074107975,
                "_source": {
                    "title": 45691
                }
            },
            {
                "_index": "stof_63976447",
                "_type": "_doc",
                "_id": "5",
                "_score": 0.074107975,
                "_source": {
                    "title": 45692
                }
            },
            {
                "_index": "stof_63976447",
                "_type": "_doc",
                "_id": "6",
                "_score": 0.074107975,
                "_source": {
                    "title": 45693
                }
            }
        ]

【讨论】：