通配符查询 Elasticsearch 上的 _all 字段答案

【问题标题】：Wildcard query over _all field on Elasticsearch通配符查询 Elasticsearch 上的 _all 字段
【发布时间】：2016-04-01 18:01:08
【问题描述】：

我正在尝试对 _all 字段执行通配符查询。一个示例查询可能是：

GET index/type/_search
{
  "from" : 0,
  "size" : 1000,
  "query" : {
    "bool" : {
      "must" : {
        "wildcard" : {
          "_all" : "*tito*"
        }
      }
    }
  }
}

问题是要使用通配符查询，_all 字段需要不_分析，否则查询将不起作用。请参阅ES documentation 了解更多信息。

我尝试使用此请求设置mappings over the _all field：

PUT index
{
    "mappings": {
        "type": {
            "_all" : {
              "enabled" : true,
              "index_analyzer": "not_analyzed",
              "search_analyzer": "not_analyzed"
            },
            "_timestamp": {
                "enabled": "true"
            },
            "properties": {
                "someProp": {
                  "type": "date"
                }
            }
        }
    }
}

但我收到错误analyzer [not_analyzed] not found for field [_all]。

我想知道我做错了什么以及是否有另一种（更好的）方法来执行这种查询。

谢谢。-

【问题讨论】：

标签： elasticsearch

【解决方案1】：

您是否尝试过删除：

"search_analyzer": "not_analyzed"

另外，我想知道通配符在所有属性中的扩展效果如何。你研究过 Ngram 吗？请参阅文档here。

【讨论】：

我已经尝试过 NGrams 并且效果更好。 NGrams 的问题是消耗大量磁盘空间。你知道有什么解决办法吗？
是的，NGrams 确实会占用大量磁盘空间（以及更多 CPU）。您可以尝试使用 min_gram 和 max_gram 设置。考虑到 ngram 的存储方式，设置太低或太高都会占用磁盘空间。在某些情况下，使用edgeNGrams 可能会消耗更少的资源，但会将 ngram 锚定到令牌的开头。

【解决方案2】：

很可能您想提供选项 "index": "not_analyzed" 字符串字段的索引属性，_all 是字符串字段，确定是否应分析该字段。

"search_analyzer"用于确定用户输入的查询应该使用哪个分析器，如果索引属性设置为分析则有效。 "index_analyzer" 用于确定应该对文档使用哪个分析器，如果索引属性设置为分析，则同样有效。

【讨论】：