【问题标题】:get shingle result from elasticsearch从 elasticsearch 获取 shingle 结果
【发布时间】:2020-06-13 16:43:23
【问题描述】:

我已经熟悉 shingle 分析器,并且能够创建如下的 shingle 分析器:

    "index": {
      "number_of_shards": 10,
      "number_of_replicas": 1
    },
    "analysis": {
      "analyzer": {
        "shingle_analyzer": {
          "filter": [
            "standard",
            "lowercase"
            "filter_shingle"
          ]
        }
      },
      "filter": {
        "filter_shingle": {
          "type": "shingle",
          "max_shingle_size": 2,
          "min_shingle_size": 2,
          "output_unigrams": false
        }
      }
    }
  }

然后我将mapping 中定义的分析器用于我的文档中名为content 的字段。问题是content 字段是一个很长的文本,我想将其用作自动完成建议器的数据,所以我只需要匹配短语后面的一两个单词。我想知道是否有办法将search(或suggestanalyze)API 结果也作为带状疱疹。通过使用shingle analyzer elastic 本身将文本索引为带状疱疹,有没有办法访问这些带状疱疹?

例如, 我通过的查询是:

GET the_index/_search
{
  "_source": ["content"],
  "explain": true, 

      "query" : {
        "match" : { "content.shngled_field": "news" }
      }
}

结果是:

    {
  "took" : 395,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 7.8647532,
    "hits" : [
      {
        "_shard" : "[v3_kavan_telegram_201911][0]",
        "_node" : "L6vHYla-TN6CHo2I6g4M_A",
        "_index" : "v3_kavan_telegram_201911",
        "_type" : "_doc",
        "_id" : "g1music/70733",
        "_score" : 7.8647532,
        "_source" : {
          "content" : "Find the latest breaking news and information on the top stories, weather, business, entertainment, politics, and more."
....
}

如您所见,结果包含整个content 字段,这是一个很长的文本。我期望的结果是

"content" : "news and information on"

这是匹配的瓦本身。

【问题讨论】:

    标签: elasticsearch


    【解决方案1】:

    创建索引并提取文档后

    PUT sh
    {
      "mappings": {
        "properties": {
          "content": {
            "type": "text",
            "fields": {
              "shingled": {
                "type": "text",
                "analyzer": "shingle_analyzer"
              }
            }
          }
        }
      },
      "settings": {
        "analysis": {
          "analyzer": {
            "shingle_analyzer": {
              "type": "standard",
              "filter": [
                "standard",
                "lowercase",
                "filter_shingle"
              ]
            }
          },
          "filter": {
            "filter_shingle": {
              "type": "shingle",
              "max_shingle_size": 2,
              "min_shingle_size": 2,
              "output_unigrams": false
            }
          }
        }
      }
    }
    
    POST sh/_doc/1
    {
      "content": "and then I use the defined analyzer in mapping for a field in my document named content.The problem is the content field is a very long text and I want to use it as data for a autocomplete suggester, so I just need one or two words that follow the matched phrase. I wonder if there is a way to get the search (or suggest or analyze) API result as shingles too. By using shingle analyzer the elastic itself indexes the text as shingles, is there a way to access those shingles?"
    }
    

    您可以使用相应的分析器调用_analyze 来查看给定文本将如何被标记:

    GET sh/_analyze
    {
      "text": "and then I use the defined analyzer in mapping for a field in my document named content.The problem is the content field is a very long text and I want to use it as data for a autocomplete suggester, so I just need one or two words that follow the matched phrase. I wonder if there is a way to get the search (or suggest or analyze) API result as shingles too. By using shingle analyzer the elastic itself indexes the text as shingles, is there a way to access those shingles?",
      "analyzer": "shingle_analyzer"
    }
    

    或查看term vectors信息:

    GET sh/_doc/1/_termvectors
    {
      "fields" : ["content.shingled"],
      "offsets" : true,
      "payloads" : true,
      "positions" : true,
      "term_statistics" : true,
      "field_statistics" : true
    }
    

    你也愿意highlighting吗?

    【讨论】:

    • 感谢您的回答。问题是您不能使用 _analyze 搜索数据。我想在数据中搜索,然后得到匹配的瓦片。 @乔
    • 您可以在查询的顶层使用"explain":true
    • 照你说的做了。添加一个 JSON 对象以导致其中没有木瓦:/ @joe
    • 您能否更新您的问题,确切地说您期望得到的响应是什么?
    猜你喜欢
    • 2019-04-26
    • 2015-08-02
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-09-09
    • 2019-12-01
    • 2016-03-02
    • 2019-05-12
    相关资源
    最近更新 更多