【问题标题】:ElasticSearch Aggregations - sum_other_doc_count in a (Term?) Query?ElasticSearch Aggregations - (术语?)查询中的 sum_other_doc_count?
【发布时间】:2015-02-13 02:41:21
【问题描述】:

默认情况下,术语聚合为我提供前 10 个最受欢迎的术语及其计数,然后是表示“其他”项目的 sum_other_doc_count 字段。

我可以向用户显示这些:

first (150)
second (122)
third(111)
...
other(19)

...然后用户可以通过选择其中一项来过滤他们的结果。我使用他们选择的术语应用 TermFilter。工作正常。

...但是.......有没有办法可以创建一个代表“其他”的过滤器(即除前 10 名之外的所有术语)?

【问题讨论】:

    标签: elasticsearch aggregation


    【解决方案1】:

    我不这么认为。不过,您可以使用termsnot 过滤器将相关(但不完全相同)的内容组合在一起,这将返回所有未出现最重要术语的文档。为简单起见,我将使用前 5 个。

    所以我创建了一个索引并添加了一些随机的拉丁文本:

    PUT /test_index
    {
        "settings": {
            "number_of_shards": 1,
            "number_of_replicas": 0
        }
    }
    
    POST /test_index/_bulk
    {"index":{"_index":"test_index","_type":"doc"}}
    {"text": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec rhoncus dictum ligula, quis volutpat diam fringilla ut."}
    {"index":{"_index":"test_index","_type":"doc"}}
    {"text": "Nulla ac gravida ipsum. Pellentesque placerat mattis pharetra. Praesent sapien lorem, auctor in imperdiet vel, lacinia vel diam."}
    {"index":{"_index":"test_index","_type":"doc"}}
    {"text": "Mauris a risus ut eros posuere rutrum. Nunc scelerisque diam ex, consequat mollis sem facilisis in."}
    {"index":{"_index":"test_index","_type":"doc"}}
    {"text": "Maecenas lacinia sollicitudin ultricies. Aenean id eleifend sapien. In et justo accumsan, cursus mi vel, consectetur augue. Nullam in quam ac magna iaculis finibus quis ut risus."}
    {"index":{"_index":"test_index","_type":"doc"}}
    {"text": "Donec dolor eros, rhoncus ultricies quam et, dapibus egestas libero."}
    

    然后得到前 5 个术语:

    POST /test_index/_search?search_type=count
    {
        "aggs": {
            "top_terms":{
                "terms":{
                    "field": "text",
                    "size": 5
                }
            }
        }
    }
    ...
    {
       "took": 4,
       "timed_out": false,
       "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
       },
       "hits": {
          "total": 5,
          "max_score": 0,
          "hits": []
       },
       "aggregations": {
          "top_terms": {
             "buckets": [
                {
                   "key": "diam",
                   "doc_count": 3
                },
                {
                   "key": "in",
                   "doc_count": 3
                },
                {
                   "key": "ut",
                   "doc_count": 3
                },
                {
                   "key": "ac",
                   "doc_count": 2
                },
                {
                   "key": "consectetur",
                   "doc_count": 2
                }
             ]
          }
       }
    }
    

    然后我可以构建一个过滤器,返回没有出现前 5 个术语的文档,例如:

    POST /test_index/_search
    {
       "query": {
          "constant_score": {
             "filter": {
                "not": {
                   "filter": {
                      "terms": {
                         "text": [
                            "diam",
                            "in",
                            "ut",
                            "ac",
                            "consectetur"
                         ]
                      }
                   }
                }
             },
             "boost": 1.2
          }
       }
    }
    ...
    {
       "took": 2,
       "timed_out": false,
       "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
       },
       "hits": {
          "total": 1,
          "max_score": 1,
          "hits": [
             {
                "_index": "test_index",
                "_type": "doc",
                "_id": "4uoLr70rRXulHHc7N3Ujmw",
                "_score": 1,
                "_source": {
                   "text": "Donec dolor eros, rhoncus ultricies quam et, dapibus egestas libero."
                }
             }
          ]
       }
    }
    

    我知道这并不能真正回答你的问题,但也许它会给你一些想法。

    这是我使用的代码(如果您使用的是 ES 1.4,则必须打开 CORS 才能在浏览器中使用该代码):

    http://sense.qbox.io/gist/93b69375c5491f1b0458e2053a08e65006f34a1c

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2016-05-15
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-12-06
      • 1970-01-01
      • 2018-11-09
      相关资源
      最近更新 更多