【问题标题】:ElasticSearch Suggester full-text-searchElasticSearch Suggester 全文搜索
【发布时间】:2021-01-24 14:40:47
【问题描述】:

我正在使用 django_elasticsearch_dsl。

我的文档:

html_strip = analyzer(
    'html_strip',
    tokenizer='standard',
    filter=["lowercase", "stop", "snowball"],
    char_filter=["html_strip"]
)

class Document(django_elasticsearch_dsl.Document):
    name = TextField(
        analyzer=html_strip,
        fields={
            'raw': fields.KeywordField(),
            'suggest': fields.CompletionField(),
        }
    )
    ...

我的要求:

_search = Document.search().suggest("suggestions", text=query, completion={'field': 'name.suggest'}).execute()

我已将以下文档“名称”编入索引:

"This is a test"
"this is my test"
"this test"
"Test this"

现在如果搜索This is my text if 将只收到

"this is my text"

但是,如果我搜索 test,那么我得到的只是

"Test this"

即使我想要所有名称中包含 test 的文档。

我错过了什么?

【问题讨论】:

  • 您有机会浏览我的回答吗,期待您的反馈????

标签: python django elasticsearch elasticsearch-dsl


【解决方案1】:

根据用户给出的评论,使用 ngrams 添加另一个答案

添加一个包含索引映射、索引数据、搜索查询和搜索结果的工作示例

索引映射:

{
  "settings": {
    "analysis": {
      "filter": {
        "ngram_filter": {
          "type": "ngram",
          "min_gram": 4,
          "max_gram": 20
        }
      },
      "analyzer": {
        "ngram_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngram_filter"
          ]
        }
      }
    },
    "max_ngram_diff": 50
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "ngram_analyzer",
        "search_analyzer": "standard"
      }
    }
  }
}

索引数据:

{
  "name": [
    "Test this"
  ]
}

{
  "name": [
    "This is a test"
  ]
}

{
  "name": [
    "this is my test"
  ]
}

{
  "name": [
    "this test"
  ]
}

分析 API:

POST/_analyze

{
  "analyzer" : "ngram_analyzer",
  "text" : "this is my test"
}

生成以下令牌:

{
  "tokens": [
    {
      "token": "this",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "test",
      "start_offset": 11,
      "end_offset": 15,
      "type": "<ALPHANUM>",
      "position": 3
    }
  ]
}

搜索查询:

{
    "query": {
        "match": {
           "name": "test"
        }
    }
}

搜索结果:

"hits": [
      {
        "_index": "stof_64281341",
        "_type": "_doc",
        "_id": "4",
        "_score": 0.2876821,
        "_source": {
          "name": [
            "Test this"
          ]
        }
      },
      {
        "_index": "stof_64281341",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "name": [
            "this is my test"
          ]
        }
      },
      {
        "_index": "stof_64281341",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "name": [
            "This is a test"
          ]
        }
      },
      {
        "_index": "stof_64281341",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": [
            "this test"
          ]
        }
      }
    ]

对于模糊搜索,您可以使用以下搜索查询:

{
  "query": {
    "fuzzy": {
      "name": {
        "value": "tst"    <-- used tst in place of test
      }
    }
  }
}

【讨论】:

  • @Shezan Kazi 添加了另一个答案(因为在上面添加相同的答案,会使该答案太长)以使用 n-gram 实现您的用例。请仔细阅读我的回答,如果这解决了您的问题,请告诉我? ?
【解决方案2】:

最好的补全提示器,可以匹配中间 fields 是 n-gram 过滤器。

您可以使用多个建议,其中一个建议基于前缀,并且您可以使用正则表达式在字段中间进行匹配。

我不知道 django_elasticsearch_dsl,添加了一个带有索引映射、数据、搜索查询和搜索结果的工作示例

索引映射:

{
  "mappings": {
    "properties": {
      "name": {
        "type": "completion"
      }
    }
  }
}

索引数据:

{
  "name": {
    "input": ["Test this"]
  }
}
{
  "name": {
    "input": ["this is my test"]
  }
}
{
  "name": {
    "input": ["This is a test"]
  }
}
{
  "name": {
    "input": ["this test"]
  }
}

搜索查询:

    {
        "suggest": {
            "suggest-exact": {
                "prefix": "test",
                "completion": {
                    "field": "name",
                    "skip_duplicates": true
                }
            },
            "suggest-regex": {
                "regex": ".*test.*",
                "completion": {
                    "field": "name",
                    "skip_duplicates": true
                }
            }
        }
    }

搜索结果:

"suggest": {
    "suggest-exact": [
      {
        "text": "test",
        "offset": 0,
        "length": 4,
        "options": [
          {
            "text": "Test this",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "4",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "Test this"
                ]
              }
            }
          }
        ]
      }
    ],
    "suggest-regex": [
      {
        "text": ".*test.*",
        "offset": 0,
        "length": 8,
        "options": [
          {
            "text": "Test this",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "4",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "Test this"
                ]
              }
            }
          },
          {
            "text": "This is a test",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "1",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "This is a test"
                ]
              }
            }
          },
          {
            "text": "this is my test",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "2",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "this is my test"
                ]
              }
            }
          },
          {
            "text": "this test",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "3",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "this test"
                ]
              }
            }
          }
        ]
      }

【讨论】:

  • @Shezan Kazi 上面的查询工作正常,但正则表达式的使用成本很高。如果你愿意,我也可以提供使用 n-gram 的解决方案。请仔细阅读我的回答,如果这解决了您的问题,请告诉我?
  • 这就像一个魅力。问题是,elasticsearch-dsl 不支持search() 中的regex。如果您可以为 ngrams 发布解决方案,那就太好了。
  • 我现在能看到的唯一问题是没有处理错别字,因为fuzzy 不是regexsuggestions 的选项。有什么解决方法吗?
  • @ShezanKazi 处理错别字,请查看我的以下答案,如果这有助于您解决问题,请告诉我?
猜你喜欢
  • 2014-11-04
  • 2020-03-10
  • 2020-03-07
  • 2017-11-15
  • 2017-03-16
  • 1970-01-01
  • 2020-08-22
  • 1970-01-01
  • 2016-08-11
相关资源
最近更新 更多