Elastic NEST 客户端：更新数据答案

【问题标题】：Elastic NEST Client: updating dataElastic NEST 客户端：更新数据
【发布时间】：2020-04-07 08:35:11
【问题描述】：

我需要更新现有数据。有没有比这更好的方法检索旧数据->修改旧数据->删除旧索引->创建新索引->批量插入新数据这似乎有点愚蠢。此外，我最终拥有大约。 store.size 每 index 高 2 倍。我不知道为什么会这样。

直接批量插入修改数据不起作用：docs.count doubles。

有什么想法吗？

更新

这是我的批量插入：

 var dataPointsBulkIndexOperationsPerBatchId = data.Select(
                item => new BulkIndexOperation<T>(item)
                            {
                                Index = indexName
                            });

            var allBulksRequest = new BulkRequest
                                      {
                                          Operations = new BulkOperationsCollection<IBulkOperation>(dataPointsBulkIndexOperationsPerBatchId),
                                          Refresh = Refresh.True
                                      };

            if (allBulksRequest.Operations.Any())
            {
                var bulkResponse = elasticClient.Bulk(allBulksRequest);
                bulkResponse.AssertResponseIsValidAndSuccessful();
            }

【问题讨论】：

如果你能在minimal reproducible example 上分享你目前的进展，那就太棒了。
对不起，我没有真正明白这一点。这更像是一个概念问题。还是我的观察有点“可疑”？
可以直接批量更新索引。在后台它将删除文档并插入新文档，因为任何更改都会导致重新索引。
是的，但是怎么做？我在更新主题中使用批量插入
查看此链接discuss.elastic.co/t/…

标签： c# elasticsearch nest

【解决方案1】：

要在一个请求中更新多个文档，您基本上有两种选择：

1。具有更新操作的批量 API

使用bulk API and sending a batch of update operations。每个更新操作都提供与更新 API 相同的选项，因此可以执行部分更新、脚本更新等。

脚本更新示例

var client = new ElasticClient();

var updates = new[] {
    new { Id = 1, Counter = 3 },
    new { Id = 2, Counter = 6 },
    new { Id = 3, Counter = 5 },
    new { Id = 4, Counter = 4 },
};

var bulkResponse = client.Bulk(b => b
    .Index("my_index")
    .UpdateMany(updates, (descriptor, update) => descriptor
        .Id(update.Id)
        .Script(s => s
            .Source("ctx._source.counter += params.counter")
            .Params(p => p
                .Add("counter", update.Counter)
            )
        )
    )
);

发送以下请求

POST http://localhost:9200/my_index/_bulk
{"update":{"_id":1}}
{"script":{"source":"ctx._source.counter += params.counter","params":{"counter":3}}}
{"update":{"_id":2}}
{"script":{"source":"ctx._source.counter += params.counter","params":{"counter":6}}}
{"update":{"_id":3}}
{"script":{"source":"ctx._source.counter += params.counter","params":{"counter":5}}}
{"update":{"_id":4}}
{"script":{"source":"ctx._source.counter += params.counter","params":{"counter":4}}}

使用脚本更新，您可以通过ctx._source 访问脚本内的_source 文档，因此本示例将源文档的counter 字段增加更新操作中counter 参数的值. The default scripting language is called Painless，脚本可以根据需要复杂。建议像上面那样参数化内联脚本，以允许 Elasticsearch 缓存和重用编译脚本产生的编译单元。

使用批量更新，您需要知道要更新的文档的 ID，以便形成批量更新操作。

2。通过查询 API 更新

update by query API 允许您对与查询匹配的一组文档执行脚本更新。

脚本化更新对每个匹配的文档执行相同的脚本。在执行脚本更新时，查询更新和批量更新之间的一个关键区别是，查询更新不能在每次文档更新时使用不同的参数值进行参数化；所有更新都执行相同的脚本更新。

查询更新示例

var updateByQueryResponse = client.UpdateByQuery<object>(b => b
    .Index("my_index")
    .Query(q => q
        .Ids(ids => ids
            .Values(1,2,3,4)
        )
    )
    .Script(s => s
        .Source("ctx._source.counter += params.counter")
        .Params(p => p
            .Add("counter", 1)
        )
    )
);

发送以下请求

POST http://localhost:9200/my_index/_update_by_query?pretty=true 
{
  "query": {
    "ids": {
      "values": [1, 2, 3, 4]
    }
  },
  "script": {
    "source": "ctx._source.counter += params.counter",
    "params": {
      "counter": 1
    }
  }
}

类似于脚本化批量更新，您可以通过ctx._source 访问脚本内的_source 文档。

通过查询更新，您无需知道要更新的文档的 ID；要更新的文档将通过匹配提供的查询来定位，该查询可以是 match_all 查询以更新所有文档。

【讨论】：

哇，感谢您的详细回答。最后只有通过脚本才有可能 - 好吧。
.. 只是为了确定.. 我假设在此操作期间更改 ID 是被禁止的？
Hm 在 V7 中重新索引 V6.x 数据并执行此更新操作后：操作 [0]：更新返回 404 _index：datapoints-batches-0efec950-3e13-4971-be78-bff50a06f0b8 _type：_doc _id ：e6cd17b0-a311-4133-80dc-8e84826cb392 _version：0 错误：类型：document_missing_exception 原因：“[_doc][e6cd17b0-a311-4133-80dc-8e84826cb392]：文档丢失”我认为映射类型已被删除？