向标准的 Azure 搜索分析器添加停用词？答案

【问题标题】：Add stopwords to a standard azure search analyzer?向标准的 Azure 搜索分析器添加停用词？
【发布时间】：2018-10-30 11:50:41
【问题描述】：

我在 Azure 搜索索引中使用 en.microsoft 分析器。在大多数情况下，它运行良好，但我需要添加一些特定于域的停用词。有没有办法在现有的分析器中添加停用词？还是要实现一个自定义分析器，它从标准分析器继承其行为，并只覆盖停用词，同时保持其他所有内容不变？

【问题讨论】：

标签： azure-cognitive-search

【解决方案1】：

虽然您不能从现有分析器继承，但您可以创建一对custom analyzers（一个用于索引，一个用于搜索），其功能等同于en.microsoft，但具有您自己的停用词列表。以下是它在 REST API 的索引定义有效负载中的外观：

{
  ...
  "analyzers": [
    {
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "name": "my_search_analyzer",
      "tokenizer": "my_english_search_tokenizer",
      "tokenFilters": [ "my_asciifolding_search", "lowercase", "my_stopword_filter" ]
    },
    {
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "name": "my_index_analyzer",
      "tokenizer": "my_english_index_tokenizer",
      "tokenFilters": [ "my_asciifolding_index", "lowercase", "my_stopword_filter" ]
    }
  ],
  "tokenizers": [
    {
      "name": "my_english_search_tokenizer",
      "@odata.type": "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer",
      "isSearchTokenizer": true,
      "language": "english"
    },
    {
      "name": "my_english_index_tokenizer",
      "@odata.type": "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer",
      "isSearchTokenizer": false,
      "language": "english"
    }
  ],
  "tokenFilters": [
    {
      "name": "my_asciifolding_search",
      "@odata.type": "#Microsoft.Azure.Search.AsciiFoldingTokenFilter",
      "preserveOriginal": false
    },
    {
      "name": "my_asciifolding_index",
      "@odata.type": "#Microsoft.Azure.Search.AsciiFoldingTokenFilter",
      "preserveOriginal": true
    },
    {
      "name": "my_stopword_filter",
      "@odata.type": "#Microsoft.Azure.Search.StopwordsTokenFilter",
      "stopwords": [ "put", "your", "custom", "stopwords", "here" ]
    }
  ]
}

【讨论】：