使用 nGram 分析器的 Django ElasticSearch DSL 部分匹配答案

【问题标题】：Django ElasticSearch DSL partial matching using nGram analyzer使用 nGram 分析器的 Django ElasticSearch DSL 部分匹配
【发布时间】：2021-11-26 15:17:15
【问题描述】：

我对 ElasticSearch 主题很陌生，我正在尝试在我的 Django 应用程序中使用 ElasticSearch 和库 django-elasticsearch-dsl Github repo 实现简单的电子商务搜索。

考虑到这些 Django 模型实例，我正在尝试（极其简化）实现的目标是：

Red T-shirts
Blue T-Shirts
Nice T-Shirts

对于搜索词T-Sh，我将获得所有这三个结果：

Red T-shirts
Blue T-Shirts
Nice T-Shirts

所以我在 shop/models.py 中有这个模型（同样非常简化）

class Category(models.Model):
   title = models.CharField(max_length=150, blank=False)
   description = models.CharField(max_length=150, blank=False)
   # In reality here I have more fields
   def __str__(self):
      return self.title

使用 shop/documents.py

from elasticsearch_dsl import analyzer, tokenizer

autocomplete_analyzer = analyzer('autocomplete_analyzer',
            tokenizer=tokenizer('trigram', 'nGram', min_gram=1, max_gram=20),
            filter=['lowercase']
        )from elasticsearch_dsl import analyzer, tokenizer

@registry.register_document
class CategoryDocument(Document):

    title: fields.TextField(analyzer=autocomplete_analyzer, search_analyzer='standard') # Here I'm trying to use the analyzer specified above


    class Index:
        name = 'categories'
        settings = {
            'number_of_shards': 1,
            'number_of_replicas': 0,
            'max_ngram_diff': 20 # This seems to be important due to the constraint for max_ngram_diff beeing 1
        }

    class Django:
        model = Category
        fields = [
            'title', 
            # In reality here I have more fields
        ]

最后，我的 shop/views.py

class CategoryElasticSearch(ListView):
    def get(self, request, lang):
        search_term = request.GET.get('search_term', '')
        q = Q(
            "multi_match", 
            query=search_term,
            fields=[
                'title', 
                # In reality here I have more fields
                ], 
            fuzziness='auto',)
        search = search.query(q)
        # ... etc

但T-Sh 的结果为空。我只有在写更长的东西时才能得到一些东西，比如T-Shir。现在我可能会获得所有三个结果。

非常感谢

【问题讨论】：

标签： django elasticsearch elasticsearch-dsl

【解决方案1】：

天哪，我让它工作了。

对于任何处理这个问题的人 - 分析器是在映射中的每个“字段”上定义的。换句话说，为了将分析器附加到 title 字段，我们的 shop/documents.py 必须如下所示：

from elasticsearch_dsl import analyzer, tokenizer

autocomplete_analyzer = analyzer('autocomplete_analyzer',
            tokenizer=tokenizer('trigram', 'nGram', min_gram=1, max_gram=20),
            filter=['lowercase']
        )from elasticsearch_dsl import analyzer, tokenizer

@registry.register_document
class CategoryDocument(Document):

    #title: fields.TextField(analyzer=autocomplete_analyzer, search_analyzer='standard') # Here I'm trying to use the analyzer specified above <-- This was extremely incorrect, due to the colon in definition, I don't know how I missed it but I did...
     title = fields.TextField(required=True, analyzer=autocomplete_analyzer) # This is it....

    class Index:
        name = 'categories'
        settings = {
            'number_of_shards': 1,
            'number_of_replicas': 0,
            'max_ngram_diff': 20 # This seems to be important due to the constraint for max_ngram_diff beeing 1
        }

    class Django:
        model = Category
        fields = [
            # 'title' <-- Notice, I removed this field, it would be redeclaration error
            # In reality here I have more fields
        ]

而且它完美无瑕...

【讨论】：