【问题标题】:Unexpected failing/rebalancing of consumers消费者意外失败/重新平衡
【发布时间】:2020-12-01 15:23:31
【问题描述】:

使用 Apache Kafka 2.1.0 和 spring-kafka 2.1.7,我们的 spring-kafka 消费者客户端收到如下错误消息:

2019-01-13 23:01:34.019 consumer-1-C-1 LogContext$KafkaLogger.error SEVERE: [Consumer clientId=consumer-2, groupId=kafka-consumer-group-x] Offset commit failed on partition topic-x-16 at offset 57882: The coordinator is not aware of this member.

在此错误发生前几秒钟,我们可以在其中一个 kafka borkers 上看到以下日志消息:

[2019-01-13 23:01:17,329] INFO [GroupCoordinator 2]: Member consumer-30-13dc06ff-aed2-4e4e-a66d-2d60d79ac526 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,330] INFO [GroupCoordinator 2]: Preparing to rebalance group kafka-consumer-group-x in state PreparingRebalance with old generation 1370 (__consumer_offsets-40) (reason: removing member consumer-30-13dc06ff-aed2-4e4e-a66d-2d60d79ac526 on heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,330] INFO [GroupCoordinator 2]: Member consumer-20-ba370e86-e1cc-4261-a73c-78cea1b00479 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,335] INFO [GroupCoordinator 2]: Member consumer-32-be8807df-b88f-4cc9-bddf-bed772d1244f in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,335] INFO [GroupCoordinator 2]: Member consumer-17-3e34f026-894e-40dc-916b-d169a43da135 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,335] INFO [GroupCoordinator 2]: Member consumer-31-4dd9cb6e-09e9-47db-9610-37e0ab5633e0 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:17,335] INFO [GroupCoordinator 2]: Member consumer-18-90175650-1224-4f22-9350-246e17e75367 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,332] INFO [GroupCoordinator 2]: Member consumer-19-663239af-9702-4e59-ad3d-f8202e9d579d in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,347] INFO [GroupCoordinator 2]: Member consumer-22-c54fb4c0-1fa1-4d9f-91fc-1da6df41b227 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,347] INFO [GroupCoordinator 2]: Member consumer-25-3bfd915c-8bd1-454b-85e3-60212b4c568e in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,347] INFO [GroupCoordinator 2]: Member consumer-27-cbb97ebf-b5cd-4cfa-991a-5302462ddab9 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,615] INFO [GroupCoordinator 2]: Member consumer-24-37fbcc73-e8c6-4820-ad56-580fd88f5a10 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,618] INFO [GroupCoordinator 2]: Member consumer-21-eea1b841-202e-4ebe-bdde-007775d001dd in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,636] INFO [GroupCoordinator 2]: Member consumer-28-881da47e-87c9-4675-9f88-e3b33748cff1 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,708] INFO [GroupCoordinator 2]: Member consumer-26-375880ee-b2a9-4ece-8eee-987d282956d8 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,708] INFO [GroupCoordinator 2]: Member consumer-23-492417e9-f3cb-4bec-bbac-130895356907 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,731] INFO [GroupCoordinator 2]: Member consumer-29-64732e9a-2c2b-44fb-a8a5-f606462a4201 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:18,947] INFO [GroupCoordinator 2]: Member consumer-10-fdd0ca92-3604-46de-9e2b-97ca41d36150 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,228] INFO [GroupCoordinator 2]: Member consumer-3-feb6986d-79af-4c64-a8f8-2dbb3bdb73c3 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,257] INFO [GroupCoordinator 2]: Member consumer-2-0345e5d5-86fc-4df0-bd39-c35b75514cea in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,257] INFO [GroupCoordinator 2]: Member consumer-1-c301f59f-8a56-4bdb-a5ef-dc163232d378 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,257] INFO [GroupCoordinator 2]: Member consumer-13-56aea64a-ecca-45e7-9474-b8f1163d01c8 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,266] INFO [GroupCoordinator 2]: Member consumer-9-3ee76e0e-86f1-4c0c-85cc-d07721bf36b1 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,273] INFO [GroupCoordinator 2]: Member consumer-4-9fa81414-870d-444d-b5d1-c38ce5c157a8 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,296] INFO [GroupCoordinator 2]: Member consumer-14-8236578f-b60d-4199-b621-913d025149d1 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,656] INFO [GroupCoordinator 2]: Member consumer-12-2921b7de-1721-460f-adbf-4fb6951cca22 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,665] INFO [GroupCoordinator 2]: Member consumer-11-09d7015c-cc33-464e-93ac-fb270f209b3f in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,667] INFO [GroupCoordinator 2]: Member consumer-5-b3fe06ff-8ef4-4d60-8571-68b7cfee12bc in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,722] INFO [GroupCoordinator 2]: Member consumer-15-5af82ca6-0ebf-463e-b9c5-4bbde513453d in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,754] INFO [GroupCoordinator 2]: Member consumer-7-c1e2bf89-c7c5-4363-b099-191956ed1c89 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,848] INFO [GroupCoordinator 2]: Member consumer-6-9b3be0e4-c1be-4d6a-98b1-caa9d095c403 in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,848] INFO [GroupCoordinator 2]: Member consumer-16-0f48ad44-402a-4706-9d78-9d0d5077a56d in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,848] INFO [GroupCoordinator 2]: Member consumer-8-0496aa54-79f7-41b8-8f31-7823ed72f16a in group kafka-consumer-group-x has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:19,848] INFO [GroupCoordinator 2]: Group kafka-consumer-group-x with generation 1371 is now empty (__consumer_offsets-40) (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:35,226] INFO [GroupCoordinator 2]: Preparing to rebalance group kafka-consumer-group-x in state PreparingRebalance with old generation 1371 (__consumer_offsets-40) (reason: Adding new member consumer-1-7787a334-acf2-4534-bc19-78af35371bfb) (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:38,227] INFO [GroupCoordinator 2]: Stabilized group kafka-consumer-group-x generation 1372 (__consumer_offsets-40) (kafka.coordinator.group.GroupCoordinator)
[2019-01-13 23:01:38,239] INFO [GroupCoordinator 2]: Assignment received from leader for group kafka-consumer-group-x for generation 1372 (kafka.coordinator.group.GroupCoordinator)

由于我们在处理消息时没有发现任何错误或表明该过程需要很长时间的迹象,因此我们无法解释这些突然的重新平衡。

有没有人暗示这可能来自哪里?

我们消费者的配置大多默认为enable.auto.commit=falseAckMode.RECORD

【问题讨论】:

  • 它如何与最新的 Spring for Apache Kafka 一起工作 2.2.3: github.com/spring-projects/spring-kafka/releases/tag/…。关键是2.1.7 可能与最新的 Apache Kafka 2.1.0 不完全兼容。我们没有测试过,不做任何保证。
  • 我们在使用 Apache Kafka 2.0.0 和 Spring for Apache Kafka 2.1.7 时遇到了同样的问题。然而,我们期待在接下来的几天内升级到2.2.3

标签: apache-kafka spring-kafka


【解决方案1】:

在 Kafka 中重新平衡的原因:

  1. 一个新的消费者加入了组
  2. 消费者离开了组(彻底关闭)
  3. 添加了新分区
  4. 在 Kafka 看来,消费者似乎已经死了

第四次的原因:

  1. 消费者无法在 max.poll.interval.ms 中进行池化(长时间运行的进程)
  2. 消费者无法在 session.timeout.ms 中向 Kafka 发送心跳

**通常心跳线程在每个heartbeat.interval.ms运行一次(默认3秒)

您的情况似乎是 4.2。

这可能有多种原因。要解决该问题,您可以增加 session.timeout.ms。 (默认为 10 秒。)

另一个解决方案是优化您的系统以按预期运行心跳线程。 (避免高IOWait、负载均衡等)

【讨论】:

  • 我敢说优化系统可以解决大多数情况下的问题。
  • 如果这个答案和其他人的话听起来有点行话,this series的前4到5篇文章会给你一个很好的基础。
【解决方案2】:

我很确定你遇到了KAFKA-7196。 你应该升级你的服务器到2.0.1 or later

作为一种解决方法,您可以尝试在每次启动时配置一个随机的client.id,但这可能会产生一些不必要的副作用。

【讨论】:

  • 同时我们在 kafka 2.1.1 上运行,但问题仍然存在。
猜你喜欢
  • 2018-05-23
  • 2017-06-19
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-08-04
  • 2019-10-31
  • 1970-01-01
  • 2022-08-08
相关资源
最近更新 更多