Spring & Elasticsearch：根据特定字段更新多个文档，无需 ID答案

【问题标题】：Spring & Elasticsearch: Update multiple documents on the basis of particular field and without IDSpring & Elasticsearch：根据特定字段更新多个文档，无需 ID
【发布时间】：2020-01-08 16:57:14
【问题描述】：

我正在使用：

弹性搜索：6.4.3
Spring Boot：2.1.9.RELEASE
Spring Elasticsearch：6.4.3

我在 ES 中有一个索引：

{
  "mapping": {
    "logi_info_index": {
      "properties": {
        "area": {
          "type": "text"
        },
        "createdBy": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "createdDate": {
          "type": "long"
        },
        "logiCode": {
          "type": "integer"
        },
        "esId": {
          "type": "keyword" -> @Id for ES
        },
        "geoPoint": {
          "type": "geo_point"
        },
        "isActive": {
          "type": "text"
        },
        "latitude": {
          "type": "text"
        },
        "longitude": {
          "type": "text"
        },
        "storeAddress": {
          "type": "text"
        },
        "storeName": {
          "type": "text"
        },
        "updatedBy": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "updatedDate": {
          "type": "long"
        }
      }
    }
  }
}

现在，在这个索引中可能有大约 50K 文档。

对于某些业务逻辑，我需要更新所有满足特定条件的文档：isActive=0。

例子：

我们有文件，其中有isActive as 0 or 1。

删除所有具有isActive = 1的文档 [=> 这可以通过DeleteQuery (deleteAll)
由于现在我们只有isActive = 0，我们想用isActive = 1 更新剩余的文档。

我有以下问题：

我如何更新所有具有特定字段值的文档，不使用 Id（就像我在删除中所做的那样）？
这可能吗？
如果可能的话，我想利用 Spring 的能力来实现它。

【问题讨论】：

标签： spring elasticsearch spring-data-elasticsearch

【解决方案1】：

这在 Spring Data Elasticsearch 中是不可能的（我假设你使用它，因为这个问题被标记为这个）。

即使在“普通”的 Elasticsearch 中，这也不容易，唯一的可能是将 Update By Query API 与脚本结合使用（我只是修改了文档示例，没有尝试过）：

POST logi_info_index/_update_by_query
{
  "script": {
    "source": "ctx._source.isActive=1",
    "lang": "painless"
  },
  "query": {
    "match_all": {}
  }
}

【讨论】：

我是使用 UpdateByQuery 和 Java 客户端完成的！
在 Elasticsearch 中进行此类批量更新是否存在重大性能缺陷？
我从来没有过这种更新的用例。但由于这将直接在集群上执行，它肯定比首先检索 id 然后为它们发出更新请求要快。 50K 或记录并没有那么多。我建议使用虚拟数据创建示例安装/索引并尝试一下。
当然。谢谢！ 50K 可能会呈指数增长。无论如何，正如您正确指出的那样，我将尝试使用虚拟数据。在接下来的几天里，我还会在这个答案上更新我的 Java 代码。
@AdiV 您如何更新 50K 记录的虚拟数据？您是否注意到对性能有任何影响？

【解决方案2】：

我是使用 ES java 客户端和 UpdateByQuery 完成的：

public void updateAll() {
    assert elasticsearchOperations != null;
    UpdateByQueryRequestBuilder updateByQuery = UpdateByQueryAction.INSTANCE
        .newRequestBuilder(elasticsearchOperations.getClient());
    updateByQuery.source(((Document) CommonUtility
        .getDoc(LogiEntity.class, Document.class)).indexName())
        .filter(query("isActive", AppConstants.TEMPORARY_ACTIVE))
        .script(script());
    BulkByScrollResponse response = updateByQuery.get();
    log.debug("process update: {}. Total updated records: {}",
        response.getStatus(), response.getUpdated());
  }

private Script script() {
    String updateCode =
        "if (ctx._source.isActive == '" + AppConstants.TEMPORARY_ACTIVE + "') "
            + "{"
            + "ctx._source.isActive = '" + AppConstants.ACTIVE + "';"
            + "}";
    return new Script(ScriptType.INLINE, "painless", updateCode,
        Collections.emptyMap());
  }

private QueryBuilder query(String fieldName, String value) {
    return QueryBuilders.matchQuery(fieldName, value);
  }

我在 Elasticsearch 中用 1.5M 条记录对其进行了测试，尝试更新 1.2M 条记录，大约需要接近 1.5 分钟。李>
由于这是一个批处理应用程序，目前，以上对我来说是可以接受的。
尽管如此，我确信可以使用批量更新和批量更新请求进一步改进它。

【讨论】：