【问题标题】:How to get Elasticsearch boolean match working for multiple fields如何让 Elasticsearch 布尔匹配适用于多个字段
【发布时间】:2015-05-08 01:06:02
【问题描述】:

我需要一些专家指导来尝试使布尔匹配正常工作。如果 both 'message' 匹配 'Failed password for', 'path' 匹配 '/var/log/,我希望查询仅返回成功的搜索结果安全”。

这是我的查询:

curl -s -XGET 'http://localhost:9200/logstash-2015.05.07/syslog/_search?pretty=true' -d '{
    "filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
    "query" : {
        "bool" : {
            "must" : [
                {  "match_phrase" : { "message" : "Failed password for" } },
                {  "match_phrase" : { "path"    : "/var/log/secure"     } }
            ]
        }
    }
} '

这是搜索输出的开始:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 46,
    "max_score" : 13.308596,
    "hits" : [ {
      "_index" : "logstash-2015.05.07",
      "_type" : "syslog",
      "_id" : "AU0wzLEqqCKq_IPSp_8k",
      "_score" : 13.308596,
      "_source":{"message":"May  7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
    }, ...

问题是,如果我将“/var/log/secure”更改为“var”,然后运行查询,我仍然会得到结果,只是分数较低。我理解 bool...must 构造意味着这里的两个匹配项都需要成功。如果“路径”与“/var/log/secure”不完全匹配,我想要的是 no 结果...

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 46,
    "max_score" : 10.354593,
    "hits" : [ {
      "_index" : "logstash-2015.05.07",
      "_type" : "syslog",
      "_id" : "AU0wzLEqqCKq_IPSp_8k",
      "_score" : 10.354593,
      "_source":{"message":"May  7 16:53:50 s_local@logstash-02 sshd[17970]: Failed password for fred from 172.28.111.200 port 43487 ssh2","@version":"1","@timestamp":"2015-05-07T16:53:50.554-07:00","type":"syslog","host":"logstash-02","path":"/var/log/secure"}
    },...

我检查了这些字段的映射以检查它们是否未被分析:

curl -X GET 'http://localhost:9200/logstash-2015.05.07/_mapping?pretty=true'

我认为这些字段未被分析,因此我相信搜索也不会被分析(基于我最近从 elasticsearch 阅读的一些培训文档)。这是下面此索引的输出 _mapping 的 sn-p。

      ....
      "message" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      "path" : {
        "type" : "string",
        "norms" : {
          "enabled" : false
        },
        "fields" : {
          "raw" : {
            "type" : "string",
            "index" : "not_analyzed",
            "ignore_above" : 256
          }
        }
      },
      ....

我哪里错了,或者我在这里误解了什么?

【问题讨论】:

    标签: elasticsearch boolean logstash match-phrase


    【解决方案1】:

    正如 OP 中所述,您需要使用字段的“not_analyzed” 视图,但根据 OP 映射,字段的非分析版本是 message.raw, path.raw 示例:

    {
        "filter" : { "range" : { "@timestamp" : { "gte" : "now-1h" } } },
        "query" : {
            "bool" : {
                "must" : [
                    {  "match_phrase" : { "message.raw" : "Failed password for" } },
                    {  "match_phrase" : { "path.raw"    : "/var/log/secure"     } }
                ]
            }
        }
    }
    

    .旁边的链接可以让您更深入地了解multi-fields

    .进一步扩展

    OP中path的映射如下:

    "path" : {
            "type" : "string",
            "norms" : {
              "enabled" : false
            },
            "fields" : {
              "raw" : {
                "type" : "string",
                "index" : "not_analyzed",
                "ignore_above" : 256
              }
            }
          }
    

    这指定路径字段使用默认分析器并且不分析field.raw。

    如果您想将路径字段设置为不分析而不是原始的,则将是以下几行:

    "path" : {
                "type" : "string",
                "index" : "not_analyzed",
                "norms" : {
                  "enabled" : false
                },
                "fields" : {
                  "raw" : {
                    "type" : "string",
                    "index" : <whatever analyzer you want>,
                    "ignore_above" : 256
                  }
                }
              }
    

    【讨论】:

    • 感谢 keety - 这确实有效。如果原始字段映射表明字段未分析,为什么需要使用 .raw 版本?
    • @DominicNicholas 如果未明确覆盖,原始字段将使用默认分析器。默认分析器通常是standard analyzer
    • 谢谢 - 当我查看字段的映射时,它显示“not_analyzed”。我是否将其与某些东西混淆了?再次感谢。
    • @DominicNicholas 我已经编辑了答案并试图进一步解释
    • 非常好 - 我现在明白了 - 感谢您的澄清!再次非常感谢。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多