Nest/ElasticSearch 按 _uid 排序答案

【问题标题】：Nest/ElasticSearch Sorting by _uidNest/ElasticSearch 按 _uid 排序
【发布时间】：2015-05-03 22:06:42
【问题描述】：

我正在尝试根据查询拉回记录并使用_uid 字段对它们进行排序。在我的情况下，_uid 是 Type 后跟 # 后跟我设置的 id。我的索引充满了代码文件，_uid 的示例是myType#MyDocuments/File.txt

所以我正在对_uid 升序进行排序。大多数情况下，它会按顺序对类型进行排序，但在类型中，它只会按照最上面的目录正确排序。

所以我会看到类似的东西

Accounting/AP_ABC.asp
Accounting/AR_ABC.asp
Accounting/Account.asp

这是不对的，因为 Account 应该在 AP 和 AR 之前。

有没有办法确保正确排序？

编辑从我的索引添加映射

"dotnet":{"properties":{"fileContents":{"type":"string"},"filePath":{"type":"string"},"lastUpdate":{"type":"date","format":"dateOptionalTime"},"type":{"type":"string"}}}

【问题讨论】：

'_uid' 字段已分析或未分析？
_uid 是由弹性搜索生成的，而不是我设置的。我可以将其更改为 not_analyzed 吗？
在浏览器中输入 localhost:9200/indexname/_mapping 并显示你有什么。
编辑了主帖
我已经编辑了你的标题。请参阅“Should questions include “tags” in their titles?”，其中的共识是“不，他们不应该”。

标签： c# sorting elasticsearch nest

【解决方案1】：

创建一个新的 not_analyzed 字段，例如 sortid，它将保存您的 id 的未分析值（Accounting/Account.asp）。 This 文章将详细解释您为什么要这样做。

更新：

尝试申请case-insensitive sorting。稍后我将用一个工作示例更新我的答案。

更新2

实现您正在尝试做的最简单的方法是创建具有以下映射的索引：

client.CreateIndex(descriptor => descriptor
    .Index(indexName)
    .AddMapping<Document>(m => m
        .Properties(p => p
            .String(s => s.Name(n => n.Id).Index(FieldIndexOption.NotAnalyzed)))));

class Document
{
    public string Id { get; set; }
}

索引一些带有小写 id 值的文档：

client.Index(new Document {Id = "Accounting/AP_ABC.asp".ToLower()});
client.Index(new Document {Id = "Accounting/AR_ABC.asp".ToLower()});
client.Index(new Document {Id = "Accounting/Account.asp".ToLower()});

那么对于这个排序

var searchResponse = client.Search<Document>(s => s
    .Sort(sort => sort
        .OnField(f => f.Id).Ascending()));

我们会得到

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": null,
      "hits": [
         {
            "_index": "indexname",
            "_type": "document",
            "_id": "accounting/account.asp",
            "_score": null,
            "_source": {
               "id": "accounting/account.asp"
            },
            "sort": [
               "accounting/account.asp"
            ]
         },
         {
            "_index": "indexname",
            "_type": "document",
            "_id": "accounting/ap_abc.asp",
            "_score": null,
            "_source": {
               "id": "accounting/ap_abc.asp"
            },
            "sort": [
               "accounting/ap_abc.asp"
            ]
         },
         {
            "_index": "indexname",
            "_type": "document",
            "_id": "accounting/ar_abc.asp",
            "_score": null,
            "_source": {
               "id": "accounting/ar_abc.asp"
            },
            "sort": [
               "accounting/ar_abc.asp"
            ]
         }
      ]
   }
}

但是，如果您真的关心您提供的 ID（例如 Accounting/AP_ABC.asp) 你可以使用前面提到的 Case-Insensitive Sorting.

使用 NEST 应用此解决方案：

如下创建映射

client.CreateIndex(descriptor => descriptor
    .Index(indexName)
    .Analysis(analysisDescriptor => analysisDescriptor
        .Analyzers(a => a
            .Add("case_insensitive_sort", new CustomAnalyzer
            {
                Tokenizer = "keyword",
                Filter = new List<string> {"lowercase"}
            })))
    .AddMapping<Document>(m => m
        .Properties(p => p
            .String(s => s
                .Name(n => n.Id)
                .Analyzer("case_insensitive_sort")))));

索引文件：

client.Index(new Document {Id = "Accounting/AP_ABC.asp"});
client.Index(new Document {Id = "Accounting/AR_ABC.asp"});
client.Index(new Document {Id = "Accounting/Account.asp"});

对于排序，我们将排序，我们将得到以下结果

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": null,
      "hits": [
         {
            "_index": "indexname",
            "_type": "document",
            "_id": "Accounting/Account.asp",
            "_score": null,
            "_source": {
               "id": "Accounting/Account.asp"
            },
            "sort": [
               "accounting/account.asp"
            ]
         },
         {
            "_index": "indexname",
            "_type": "document",
            "_id": "Accounting/AP_ABC.asp",
            "_score": null,
            "_source": {
               "id": "Accounting/AP_ABC.asp"
            },
            "sort": [
               "accounting/ap_abc.asp"
            ]
         },
         {
            "_index": "indexname",
            "_type": "document",
            "_id": "Accounting/AR_ABC.asp",
            "_score": null,
            "_source": {
               "id": "Accounting/AR_ABC.asp"
            },
            "sort": [
               "accounting/ar_abc.asp"
            ]
         }
      ]
   }
}

希望它会有所帮助。

【讨论】：

我创建了一个名为“sortByField”的not_analyzed字段，类型为“string”，它的排序方式仍然与我在原始帖子中提到的方式相同
您是否重新索引了您的数据？
是的，我删除了所有内容，使用该字段创建了新的索引和映射，但它的排序仍然完全相同......我现在在我的映射中有这个：“sortByField”：{“type”：“string ", "index": "not_analyzed" }