弹性搜索的Upsert功能？答案

【问题标题】：Upsert function for elasicsearch?弹性搜索的Upsert功能？
【发布时间】：2018-06-06 00:29:24
【问题描述】：

我想定期更新elasticsearch中的数据。

在我发送更新的文件中，可能有elasticsearh中已经存在的数据（更新）和新文档的数据（插入）。

由于elasticsearch中的数据是由自动创建的ID管理的，我必须通过“代码”（唯一）列搜索 ID，以确保文档是否已经存在，如果存在则更新，否则插入。

不知道有没有比我想到的下面的代码更快的方法。

es = Elasticsearch()

# get doc ID by searching(exact match) a code to check if ID exists
res = es.search(index=index_name, doc_type=doc_type, body=body_for_search)
id_dict = dict([('id', doc['_id'])]) for doc in res['hits']['hits’]

# if id exists, update the current doc by id   
# else insert with auto-created id 
If id_dict['id']:
    es.update(index=index_name, id=id_dict['id'], doc_type=doc_type, body=body)
else:
    es.index(index=index_name, doc_type=doc_type, body=body)

例如，是否有一种方法可以让 elasticsearch 为您搜索完全匹配的 col["code"]，并且您可以简单地“更新”数据而不指定 id？任何建议将不胜感激，并感谢您的阅读。

ps-如果我们把id = col["code"]做成id = col["code"]它可能会更简单更快，但是对于管理问题我们目前还做不到。

【问题讨论】：

elastic.co/guide/en/elasticsearch/reference/2.0/…
您可以告诉 elasticsearch 使用您自己的 ID。这样你就可以只索引已知 ID 的文档，它将被更新

标签： python elasticsearch

【解决方案1】：

正如@Archit 所说，使用自己的 ID 更快地查找文档
使用 upsert API https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#upserts

确保您的 ID 结构尊重 Lucene 良好做法：

如果您使用自己的 ID，请尝试选择一个友好的 ID 卢森。示例包括零填充的顺序 ID、UUID-1 和纳米时间；这些 ID 具有一致的顺序模式，可压缩出色地。相比之下，像 UUID-4 这样的 ID 本质上是随机的，并且提供压缩效果差，导致 Lucene 变慢。

【讨论】：