standard(默认)分析器将按如下方式标记此电子邮件:
GET _analyze
{
"text": "dm0e776467@mail.com",
"analyzer": "standard"
}
屈服
{
"tokens" : [
{
"token" : "dm0e776467",
...
},
{
"token" : "mail.com",
...
}
]
}
这解释了为什么多重匹配适用于任何 *mail.com 后缀以及通配符失败的原因。
受this answer 的启发,我建议对您的映射进行以下修改:
PUT users
{
"settings": {
"analysis": {
"filter": {
"email": {
"type": "pattern_capture",
"preserve_original": true,
"patterns": [
"([^@]+)",
"(\\p{L}+)",
"(\\d+)",
"@(.+)",
"([^-@]+)"
]
}
},
"analyzer": {
"email": {
"tokenizer": "uax_url_email",
"filter": [
"email",
"lowercase",
"unique"
]
}
}
}
},
"mappings": {
"properties": {
"email": {
"type": "text",
"analyzer": "email"
},
"firstName": {
"type": "text",
"fields": {
"as_email": {
"type": "text",
"analyzer": "email"
}
}
},
"lastName": {
"type": "text",
"fields": {
"as_email": {
"type": "text",
"analyzer": "email"
}
}
}
}
}
}
请注意,我在您的 first- 和 lastName 字段中使用了 .as_email 字段——您可能不想强制它们默认映射为电子邮件。
然后在索引几个样本之后:
POST _bulk
{"index":{"_index":"users","_type":"_doc"}}
{"firstName":"abc","lastName":"adm0e776467@mail.coms","email":"dm0e776467@mail.com"}
{"index":{"_index":"users","_type":"_doc"}}
{"firstName":"xyz","lastName":"opr","email":"dm0e776467@mail.com"}
{"index":{"_index":"users","_type":"_doc"}}
{"firstName":"zyx","lastName":"dm0e776467@mail.com","email":"qwe"}
{"index":{"_index":"users","_type":"_doc"}}
{"firstName":"abc","lastName":"efg","email":"ijk"}
通配符工作得很好:
GET users/_search
{
"query": {
"bool": {
"should": [
{
"wildcard": {
"email": "dm0e776467@mail.com"
}
},
{
"wildcard": {
"lastName.as_email": "dm0e776467@mail.com"
}
},
{
"wildcard": {
"firstName.as_email": "dm0e776467@mail.com"
}
}
]
}
}
}
请检查此标记器如何在后台工作以防止“令人惊讶”的查询结果:
GET users/_analyze
{
"text": "dm0e776467@mail.com",
"field": "email"
}