【问题标题】:How to add additional stop words to a Neo4j Full-text analyzer如何向 Neo4j 全文分析器添加额外的停用词
【发布时间】:2021-11-13 22:28:59
【问题描述】:

我正在做全文搜索,需要添加到停用词列表中。

ElasticSearch 中的内容与此完全相同:How to add stopwords to the default list in ElasticSearch

如果不编写自定义分析器作为插件,这是否可行? 我的索引如下所示:

CREATE FULLTEXT INDEX productNameIndex FOR (n:Product) ON EACH [n.name] 
    OPTIONS {indexConfig: {`fulltext.analyzer`: 'danish' }}

是否可以这样做:fulltext.stopwords: ['word1', 'word2']fulltext.stopwords: ./stopwords.txt

我之前没有尝试为 Neo4j 编写自定义插件,但这似乎很吓人。

【问题讨论】:

    标签: search neo4j full-text-search


    【解决方案1】:

    AFAIK,不编写插件就不可能编写自定义分析器。幸运的是,您可以为自定义分析器复制一些现有代码。 例如,您可以查看 covidgraph 自定义分析器:

    https://github.com/covidgraph/neo4j-additional-analyzers

    @Service.Implementation(AnalyzerProvider.class)
    public class SynonymAnalyzerProvider extends AnalyzerProvider {
    
        public static final String DESCRIPTION = "analyzer using synonyms";
        public static final String ANALYZER_NAME = "synonym";
    
        public SynonymAnalyzerProvider() {
            super(ANALYZER_NAME, new String[0]);
        }
    
        public Analyzer createAnalyzer() {
            try {
                return CustomAnalyzer.builder()
                        .withTokenizer(WhitespaceTokenizerFactory.class)
                        .addTokenFilter(SynonymFilterFactory.class, "synonyms", "gene_symbols.txt", "ignoreCase", "true")
                        .addTokenFilter(StopFilterFactory.class, "format", "snowball", "words", "org/apache/lucene/analysis/snowball/english_stop.txt,org/apache/lucene/analysis/snowball/german_stop.txt", "ignoreCase", "true")
                        .addTokenFilter(LowerCaseFilterFactory.class)
                        .build();
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        }
    
        @Override
        public String description() {
            return DESCRIPTION;
        }
    }
    

    具体来说,停用词似乎添加在以下行中:

    .addTokenFilter(StopFilterFactory.class, "format", "snowball", "words", "org/apache/lucene/analysis/snowball/english_stop.txt,org/apache/lucene/analysis/snowball/german_stop.txt", "ignoreCase", "true")
    

    【讨论】:

    • 您还可以查看这篇关于在分析器中处理同义词的博文graphaware.com/neo4j/2019/12/20/…
    • 谢谢 - 我觉得对我来说最难的部分是插入插件。你知道没有就没有办法吗?
    猜你喜欢
    • 2014-02-03
    • 1970-01-01
    • 2019-05-28
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-10-15
    • 2023-02-09
    相关资源
    最近更新 更多