【问题标题】:Make Elasticsearch diacritics insensitive使 Elasticsearch 变音符号不敏感
【发布时间】:2019-09-25 10:41:10
【问题描述】:

我在 .NET MVC 项目中使用 Elasticsearch 6.6.0 和 NEST。

我正在使用此代码索引一些产品:

var esSettings = new ConnectionSettings(node);
esSettings = esSettings.DefaultIndex(IndexInstanceName);
esSettings = esSettings
    .DefaultMappingFor<SearchableProduct>(s => s.IdProperty("Id").IndexName(IndexInstanceName + "-products-" + ConfigurationManager.AppSettings["DefaultCulture"]));

var elastic = new ElasticClient(esSettings);
var mapResponse = elastic.Map<SearchableProduct>(x => x.AutoMap().Index(IndexInstanceName + "-products-" + culture));

var indexState = new IndexState
{
    Settings = new IndexSettings()
};

indexState.Settings.Analysis = new Analysis
{
    Analyzers = new Analyzers()
};

indexState.Settings.Analysis.Analyzers.Add("nospecialchars", new CustomAnalyzer
{
    Tokenizer = "standard",
    Filter = new List<string> { "standard", "lowercase", "stop", "asciifolding" }
});

//products
if (!elastic.IndexExists(IndexInstanceName + "-products-" + culture).Exists)
{
    var response = elastic.CreateIndex(
        IndexInstanceName + "-products-" + culture,
        s => s.InitializeUsing(indexState)
               .Mappings(m => m.Map<SearchableProduct>(sc => sc.AutoMap())));
}

await this.IndexProductsAsync(context, products, elastic, culture);
await elastic.RefreshAsync(new RefreshRequest(IndexInstanceName + "-products-" + culture));

对于搜索,我使用以下代码:

ISearchResponse<SearchableProduct> result = await elastic.SearchAsync<SearchableProduct>(s => s
                           .Index(elasticIndexName + "-products-" + culture)
                           .Take(DefaultPageSize)
                           .Source(src => src.IncludeAll())
                            .Query(query =>
                               query.QueryString(qs =>
                                qs.Query(q).DefaultOperator(Operator.And).Fuzziness(Fuzziness.EditDistance(0)).Fields(x => x.Field(d => d.Name, 2)
                                                    .Field(d => d.MetaTitle, 1)
                                                    .Field(d => d.Image, 1)
                                                    .Field(d => d.SystemId, 2)
                                                    .Field(d => d.Manufacturer, 1)
                                        )
                            ))
                           .Sort(d => d.Ascending(SortSpecialField.Score))
                        );

当我在希腊语中搜索带有重音的单词时(例如 παγωτό),我得到了结果(因为在我的索引中,产品是带有重音的索引),但是当我使用没有重音的相同单词时(例如 παγωτο)我没有得到任何结果.

索引设置或搜索代码有什么问题吗?

我是否可以不带重音索引我的数据,或者按原样索引它们但使搜索或索引重音不敏感?

【问题讨论】:

    标签: c# asp.net-mvc elasticsearch nest


    【解决方案1】:

    使用greek 分析器创建字段将确保索引文本和查询字符串通过相同的分析路径。对于παγωτό,这意味着在索引期间,文本将被标记为παγωτ,以及在发出查询请求期间。

    请查看我的示例,该示例使用greek 分析器创建了一个字段,并且该示例在查找παγωτόπαγωτο 时输出带有παγωτόπαγωτο 的两个文档。

    class Program
    {
        static async Task Main(string[] args)
        {
            var connectionPool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
            var settings = new ConnectionSettings(connectionPool)
                .DefaultIndex("index_name")
                .DisableDirectStreaming()
                .PrettyJson();
            var client = new ElasticClient(settings);
    
            await client.Indices.DeleteAsync("index_name");
    
            var createIndexResponse = await client.Indices.CreateAsync("index_name",
                c => c
                    .Map(map => map.AutoMap<Document>()));
    
            await client.IndexManyAsync(new []
                {new Document {Id = 1, Text = "παγωτό"}, new Document {Id = 2, Text = "παγωτο"},});
    
            await client.Indices.RefreshAsync();
    
            var query = "παγωτό";
            var searchResponse = await client.SearchAsync<Document>(s => s
                .Query(q => q.Match(m => m.Field(f => f.Text).Query(query))));
    
            Console.OutputEncoding = Encoding.UTF8;
    
            Print(query, searchResponse);
    
            query = "παγωτο";
            var searchResponse2 = await client.SearchAsync<Document>(s => s
                .Query(q => q.Match(m => m.Field(f => f.Text).Query(query))));
    
            Print(query, searchResponse2);
        }
    
        private static void Print(string query, ISearchResponse<Document> searchResponse)
        {
            Console.WriteLine($"For {query} found:");
            foreach (var document in searchResponse.Documents)
            {
                Console.WriteLine($"Document {document.Id} {document.Text}");
            }
        }
    }
    
    public class Document
    {
        public int Id { get; set; }
        [Text(Analyzer = "greek")]
        public string Text { get; set; }
    }
    

    打印:

    For παγωτό found:
    Document 1 παγωτό
    Document 2 παγωτο
    For παγωτο found:
    Document 1 παγωτό
    Document 2 παγωτο
    

    希望对您有所帮助。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-09-19
      • 1970-01-01
      • 2013-07-25
      • 1970-01-01
      • 2011-11-29
      • 1970-01-01
      • 2017-08-15
      • 1970-01-01
      相关资源
      最近更新 更多