【问题标题】:Lucene.Net 4.8 add multiple filters to custom analyzerLucene.Net 4.8 向自定义分析器添加多个过滤器
【发布时间】:2018-08-23 15:50:13
【问题描述】:

我正在尝试创建一个应用了多个过滤器的自定义分析器。

问题是仅应用了最后一个过滤器 (LowerCaseFilter)。

public class CustomAnalyzer : Analyzer
        {
            protected override TokenStreamComponents CreateComponents(string fieldName, TextReader reader)
            {

                Tokenizer tokenizer = new KeywordTokenizer(reader);

                //Remove basic stop words a, an, the, in, on etc

                TokenStream result = new StopFilter(GlobalVariables.LuceneVersion, tokenizer, StopAnalyzer.ENGLISH_STOP_WORDS_SET);

                ////Remove tile/tiles
                CharArraySet stopWords = new CharArraySet(GlobalVariables.LuceneVersion, 1, true)
                {
                    "test",
                    }

                result = new  StopFilter(GlobalVariables.LuceneVersion, tokenizer, stopWords);

                //Make case insenstive
                result = new LowerCaseFilter(GlobalVariables.LuceneVersion, tokenizer);

                return new TokenStreamComponents(tokenizer, result);
            }
        }

【问题讨论】:

    标签: lucene lucene.net


    【解决方案1】:

    不要将分词器传入每个过滤器,将前一个过滤器传入。

    Tokenizer tokenizer = new KeywordTokenizer(reader);
    TokenStream result = new StopFilter(GlobalVariables.LuceneVersion, tokenizer, StopAnalyzer.ENGLISH_STOP_WORDS_SET);
    CharArraySet stopWords = new CharArraySet(GlobalVariables.LuceneVersion, 1, true)
    result = new  StopFilter(GlobalVariables.LuceneVersion, result, stopWords);
    result = new LowerCaseFilter(GlobalVariables.LuceneVersion, result);
    return new TokenStreamComponents(tokenizer, result);
    

    【讨论】:

      猜你喜欢
      • 2013-03-26
      • 1970-01-01
      • 2010-12-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2010-11-07
      • 2020-12-26
      • 1970-01-01
      相关资源
      最近更新 更多