【问题标题】:Elasticsearch: multiple pre_tags/post_tags with fast vector highlighterElasticsearch:具有快速矢量荧光笔的多个 pre_tags/post_tags
【发布时间】:2016-10-09 21:13:43
【问题描述】:

该文档包含以下关于 pre_tags/post_tags 设置的神秘评论,该设置能够包含一对以上的 pre-/post-tags:

使用快速矢量荧光笔可以有更多标签,并且 “重要性”是有序的。

有谁知道这句话的确切含义是什么?

【问题讨论】:

    标签: elasticsearch


    【解决方案1】:

    花了一些时间,但通过使用 ES 1.7 和 _head 插件尝试不同的查询,我能够弄清楚多个 pre 和 post 标记如何影响突出显示。

    使用快速矢量荧光笔,您可以按“重要性”的顺序指定标签,这似乎意味着它们的顺序和搜索词的顺序应该匹配。使用多个 pre 或 post 标记来达到任何效果都需要查询中的多个字段。

    给定索引

    {
     myindex: {
      mappings: {
       corpdocument: {
        properties: {
         createddate: {
          type: "date",
          format: "dateOptionalTime"
         },
         docbody: {
          type: "string",
          analyzer: "text_analyzer",
          fields: {
           exact: {
            type: "string",
            analyzer: "text_analyzer_exact"
           }
          }
         },
         modifieddate: {
          type: "date",
          format: "dateOptionalTime"
         },
         title: {
          type: "string"
         }
        }
       }
      }
     }
    }
    

    和搜索

    POST locahost:9200/myindex/corpdocument/_search
    {
     "highlight": {
      "pre_tags": ["|primary-highlight|",
      "|secondary-highlight|",
      "post_tags": ["|/primaryh-highlight|",
      "|/secondary-highlight|",
      "fields": {
       "docbody.exact": {
        "fragment_size": 150,
        "number_of_fragments": 3
       }
      }
     },
     "_source": {
      "exclude": ["docbody"]
     },
     "query": {
      "bool": {
       "should": [{
        "match": {
         "docbody.exact": {
          "query": "foo"
         }
        }
       },
       {
        "match": {
         "docbody.exact": {
          "query": "bar"
         }
        }
       }
      }
     }
    }
    

    你可以得到这样的结果

    {
     "took": 14,
     "timed_out": false,
     "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
     },
     "hits": {
      "total": 97,
      "max_score": 0.48895144,
      "hits": [{
       "_index": "myindex",
       "_type": "corpdocument",
       "_id": "XFxxZWR0ZXN0ZG9jc1xTYW5kYm94XFNhbmRib3hBbGxcRGV4dGVyX2xpdFw3NS5kb2M=",
       "_score": 0.48895144,
       "_source": {
        "createddate": "2010-11-02T00:00:00-05:00",
        "modifieddate": "2007-09-04T00:00:00-05:00",
        "_id": "XFxxZWR0ZXN0ZG9jc1xTYW5kYm94XFNhbmRib3hBbGxcRGV4dGVyX2xpdFw3NS5kb2M="
       },
       "highlight": {
        "docbody.exact": ["Lorem ipsum dolor sit amet, consectetur adipiscing elit |primary-highlight|foo|/primary-highlight|Lorem ipsum dolor sit amet, consectetur adipiscing elit",
        "Lorem ipsum dolor sit amet, consectetur adipiscing elit |secondary-highlight|bar|/secondary-highlight|TOTHE|primary-highlight|foo</span>|/primary-highlight|Lorem ipsum dolor sit amet, consectetur adipiscing elit",
        "Lorem ipsum dolor sit amet, consectetur adipiscing elit |secondary-highlight|bar|/secondary-highlight| Lorem ipsum dolor sit amet, consectetur adipiscing elit |primary-highlight|Chief|/primary-highlight| Lorem ipsum dolor sit amet, consectetur adipiscing elit"]
       }
      },
      ...
      ]
     }
    }
    

    哪个标签包含哪个命中取决于标签和搜索词的顺序。切换“foo”和“bar”的顺序,而其他所有内容都保持不变,这将导致 bar 被包裹在主标签中,而 foo 被包裹在辅助标签中。

    从一些使用带有 2 个标签的 3 个搜索词的初始实验来看,似乎第三个词被包裹在第一个标签而不是第二个标签中。添加第三个标签可以解决这个问题,但需要将第二个标签复制 n 次才能覆盖所有搜索词。

    "highlight": {
     "pre_tags": ["|primary-highlight|",
     "|secondary-highlight|",
     "|secondary-highlight|",
     "post_tags": ["|/primaryh-highlight|",
     "|/secondary-highlight|",
     "|/secondary-highlight|",
     "fields": {
      "docbody.exact": {
       "fragment_size": 150,
       "number_of_fragments": 3
      }
     }
    },
    ..."query": {
     "bool": {
      "should": [{
       "match": {
        "docbody.exact": {
         "query": "foo"
        }
       }
      },
      {
       "match": {
        "docbody.exact": {
         "query": "bar"
        }
       }
      },
      {
       "match": {
        "docbody.exact": {
         "query": "baz"
        }
       }
      }
     }
    }
    

    【讨论】:

      猜你喜欢
      • 2014-09-17
      • 2012-11-27
      • 2018-08-12
      • 1970-01-01
      • 2010-09-10
      • 2011-12-04
      • 1970-01-01
      • 2011-11-09
      • 1970-01-01
      相关资源
      最近更新 更多