用于增量处理新数据的 Elasticsearch 批量索引答案

【问题标题】：Elasticsearch Bulk Indexing for handling new data incrementally用于增量处理新数据的 Elasticsearch 批量索引
【发布时间】：2020-10-20 13:58:00
【问题描述】：

我已实施批量索引。我想让它更有效率。

# current implementation in Python

def products_to_index():
    for product in all_products():
        yield {
            "_op_type": "index",
            "_index": INDEX_NAME,
            "_id": product.id,
            "_source": {"name": product.name, "content": product.content},
        }


def main(args):
    # Connect to localhost:9200 by default.
    es = Elasticsearch()
    body = ANALYZER  
    
    es.indices.create(index=INDEX_NAME, body=body)

    bulk(es, products_to_index())

此实现似乎只是获取所有数据并逐批索引它们。我想执行一个额外的步骤来检查这个条目是否已经被索引。

我还考虑过从本地保存的索引的path 加载。不知道如何继续。

我查看了 API 文档，但找不到任何内容。

【问题讨论】：

标签： python elasticsearch

【解决方案1】：

通过使用index，你告诉elasticsearch我想索引这个文档，如果它存在就更新它。但是，如果您使用具有特定 id 的 create 类型，则允许以“put-if-absent”方式进行弹性搜索。当您使用批量 API 时，您的响应将分别显示每个文档的结果，并且您可以知道哪个文档插入了哪个文档，哪个文档没有插入。为此，只需将您的op_type 设置为create。

【讨论】：