【问题标题】:Django ElasticSearch DSL partial matching using nGram analyzer使用 nGram 分析器的 Django ElasticSearch DSL 部分匹配
【发布时间】:2021-11-26 15:17:15
【问题描述】:

我对 ElasticSearch 主题很陌生,我正在尝试在我的 Django 应用程序中使用 ElasticSearch 和库 django-elasticsearch-dsl Github repo 实现简单的电子商务搜索。

考虑到这些 Django 模型实例,我正在尝试(极其简化)实现的目标是:

Red T-shirts
Blue T-Shirts
Nice T-Shirts

对于搜索词T-Sh,我将获得所有这三个结果:

Red T-shirts
Blue T-Shirts
Nice T-Shirts

所以我在 shop/models.py 中有这个模型(同样非常简化)

class Category(models.Model):
   title = models.CharField(max_length=150, blank=False)
   description = models.CharField(max_length=150, blank=False)
   # In reality here I have more fields
   def __str__(self):
      return self.title

使用 shop/documents.py

from elasticsearch_dsl import analyzer, tokenizer

autocomplete_analyzer = analyzer('autocomplete_analyzer',
            tokenizer=tokenizer('trigram', 'nGram', min_gram=1, max_gram=20),
            filter=['lowercase']
        )from elasticsearch_dsl import analyzer, tokenizer

@registry.register_document
class CategoryDocument(Document):

    title: fields.TextField(analyzer=autocomplete_analyzer, search_analyzer='standard') # Here I'm trying to use the analyzer specified above


    class Index:
        name = 'categories'
        settings = {
            'number_of_shards': 1,
            'number_of_replicas': 0,
            'max_ngram_diff': 20 # This seems to be important due to the constraint for max_ngram_diff beeing 1
        }

    class Django:
        model = Category
        fields = [
            'title', 
            # In reality here I have more fields
        ]

最后,我的 shop/views.py

class CategoryElasticSearch(ListView):
    def get(self, request, lang):
        search_term = request.GET.get('search_term', '')
        q = Q(
            "multi_match", 
            query=search_term,
            fields=[
                'title', 
                # In reality here I have more fields
                ], 
            fuzziness='auto',)
        search = search.query(q)
        # ... etc

T-Sh 的结果为空。我只有在写更长的东西时才能得到一些东西,比如T-Shir。现在我可能会获得所有三个结果。

非常感谢

【问题讨论】:

    标签: django elasticsearch elasticsearch-dsl


    【解决方案1】:

    天哪,我让它工作了。

    对于任何处理这个问题的人 - 分析器是在映射中的每个“字段”上定义的。换句话说,为了将分析器附加到 title 字段,我们的 shop/documents.py 必须如下所示:

    from elasticsearch_dsl import analyzer, tokenizer
    
    autocomplete_analyzer = analyzer('autocomplete_analyzer',
                tokenizer=tokenizer('trigram', 'nGram', min_gram=1, max_gram=20),
                filter=['lowercase']
            )from elasticsearch_dsl import analyzer, tokenizer
    
    @registry.register_document
    class CategoryDocument(Document):
    
        #title: fields.TextField(analyzer=autocomplete_analyzer, search_analyzer='standard') # Here I'm trying to use the analyzer specified above <-- This was extremely incorrect, due to the colon in definition, I don't know how I missed it but I did...
         title = fields.TextField(required=True, analyzer=autocomplete_analyzer) # This is it....
    
        class Index:
            name = 'categories'
            settings = {
                'number_of_shards': 1,
                'number_of_replicas': 0,
                'max_ngram_diff': 20 # This seems to be important due to the constraint for max_ngram_diff beeing 1
            }
    
        class Django:
            model = Category
            fields = [
                # 'title' <-- Notice, I removed this field, it would be redeclaration error
                # In reality here I have more fields
            ]
    

    而且它完美无瑕...

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-07-02
      • 2016-04-09
      相关资源
      最近更新 更多