AFAIK,不编写插件就不可能编写自定义分析器。幸运的是,您可以为自定义分析器复制一些现有代码。
例如,您可以查看 covidgraph 自定义分析器:
https://github.com/covidgraph/neo4j-additional-analyzers
@Service.Implementation(AnalyzerProvider.class)
public class SynonymAnalyzerProvider extends AnalyzerProvider {
public static final String DESCRIPTION = "analyzer using synonyms";
public static final String ANALYZER_NAME = "synonym";
public SynonymAnalyzerProvider() {
super(ANALYZER_NAME, new String[0]);
}
public Analyzer createAnalyzer() {
try {
return CustomAnalyzer.builder()
.withTokenizer(WhitespaceTokenizerFactory.class)
.addTokenFilter(SynonymFilterFactory.class, "synonyms", "gene_symbols.txt", "ignoreCase", "true")
.addTokenFilter(StopFilterFactory.class, "format", "snowball", "words", "org/apache/lucene/analysis/snowball/english_stop.txt,org/apache/lucene/analysis/snowball/german_stop.txt", "ignoreCase", "true")
.addTokenFilter(LowerCaseFilterFactory.class)
.build();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
@Override
public String description() {
return DESCRIPTION;
}
}
具体来说,停用词似乎添加在以下行中:
.addTokenFilter(StopFilterFactory.class, "format", "snowball", "words", "org/apache/lucene/analysis/snowball/english_stop.txt,org/apache/lucene/analysis/snowball/german_stop.txt", "ignoreCase", "true")