【问题标题】:How to do Incremental/Search as you type full text search on 5 million records sets using Elastic search如何在使用弹性搜索在 500 万条记录集上键入全文搜索时进行增量/搜索
【发布时间】:2018-07-26 07:54:35
【问题描述】:

我在所有维基百科文章名称的巨大数据集上使用弹性搜索,它们大约有 500 万个数字数据库字段名称是文章名称

curl -XPUT "http://localhost:9200/index_wiki_articlenames/" -d'
{
   "settings":{
      "analysis":{
         "filter":{
            "nGram_filter":{
               "type":"edgeNGram",
               "min_gram":1,    
               "max_gram":20,
               "token_chars":[
                  "letter",
                  "digit",
                  "punctuation",
                  "symbol"
               ]
            }
         },
         "tokenizer":{
            "edge_ngram_tokenizer":{
               "type":"edgeNGram",
               "min_gram":"1",
               "max_gram":"20",
               "token_chars":[
                  "letter",
                  "digit"
               ]
            }                                                                                                                   
         },
         "analyzer":{
            "nGram_analyzer":{
               "type":"custom",
               "tokenizer":"edge_ngram_tokenizer",
               "filter":[
                  "lowercase",
                  "asciifolding"
               ]
            }
         },
         "whitespace_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ]
            }
      }
   },
   "mappings":{                                                                         
      "name":{
         "properties":{
            "articlenames":{
               "type":"text",
               "analyzer":"nGram_analyzer"
            }
         }
      }
   }
}'

引用这些链接也可以解决我的问题,但徒劳无功

Edge NGram with phrase matching

https://hackernoon.com/elasticsearch-building-autocomplete-functionality-494fcf81a7cf

我的目标是为“sachin t”的输入查询获得如下结果

sachin tendulkar
sachin tendulkar centuries
sachin tejas 
sachin top 60 quotes
sachin talwalkar
sachin tawade
sachin taps

对于“sachin te”的查询

sachin tendulkar
sachin tendulkar centuries
sachin tejas 

对于“sachin ta”的查询

sachin talwalkar
sachin tawade
sachin taps

对于“sachin 十”的查询

sachin tendulkar
sachin tendulkar centuries

请记住,数据集非常庞大,有些文章名称和单词可能包含特殊字符和单词,例如“Bronisław-Komorowski”

我能够获得多达 10 万条记录的较小数据集的输出,但只要我的数据集更改为 0.5 到 5 百万条记录 我无法获得输出

我的查询是

http://127.0.0.1:9200/index_wiki_articlenames/_search?&q=articlenames:sachin-t+articlenames:sachin-t.*&filter_path=hits.hits._source.articlenames&size=50

【问题讨论】:

标签: elasticsearch search full-text-search n-gram incremental-search


【解决方案1】:

您应该尝试以下设置:

curl -XPUT "http://localhost:9200/index_wiki_articlenames/" -d'
{
   "settings":{
      "analysis":{
         "tokenizer":{
            "edge_ngram_tokenizer":{
               "type":"edgeNGram",
               "min_gram":"1",
               "max_gram":"20",
               "token_chars":[
                  "letter",
                  "digit"
               ]
            }                                                                                                                   
         },
         "analyzer":{
            "nGram_analyzer":{
               "type":"custom",
               "tokenizer":"edge_ngram_tokenizer",
               "filter":[
                  "lowercase",
                  "asciifolding"
               ]
            }
         }
      }
   },
   "mappings":{                                                                         
      "name":{
         "properties":{
            "articlenames":{
               "type":"text",
               "analyzer":"nGram_analyzer",
               "search_analyzer": "standard"
            }
         }
      }
   }
}'

查询时也试试这个查询:

GET my_index/_search
{
  "query": {
    "match": {
      "articlenames": {
        "query": "Sachin T", 
        "operator": "and"
      }
    }
  }
}

【讨论】:

    【解决方案2】:

    我知道为时已晚,但任何正在为此寻找解决方案的人都可以试试这个查询。映射和索引是正确的。查询部分似乎缺少和运算符。

    GET index_wiki_articlenames/_search
    {
      "query": {
        "match": {
          "articlenames": {
            "query": "sachin ten", 
            "operator": "and"
          }
        }
      }
    }
    

    这会导致

    sachin tendulkar
    sachin tendulkar centuries
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-10-22
      • 2018-02-03
      • 2012-10-26
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多