【问题标题】:Kafka Container stopped after sometime, Client Session TimedoutKafka 容器在一段时间后停止,客户端会话超时
【发布时间】:2021-12-03 18:10:33
【问题描述】:

我有一个 Zookeeper 和 2 个 kafka 代理在 docker 环境中运行。我能够让 Zookeeper 和两个 kafka 代理成功启动并运行(生产者/消费者能够连接和发送/接收数据),但过了一段时间(可能一天后),其中一个代理停止了。以下是停止的 kafka 服务器的最后一个日志。

[2021-10-14 16:15:23,553] INFO [GroupCoordinator 2]: Preparing to rebalance group console-consumer-95901 in state PreparingRebalance with old generation 1 (__consumer_offsets-35) (reason: removing member consumer-console-consumer-95901-1-66524e7c-561d-49f8-882e-93e5ee9732fa on heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)
[2021-10-14 16:15:23,553] INFO [GroupCoordinator 2]: Group console-consumer-95901 with generation 2 is now empty (__consumer_offsets-35) (kafka.coordinator.group.GroupCoordinator)
[2021-10-14 16:23:09,577] INFO [GroupMetadataManager brokerId=2] Group console-consumer-95901 transitioned to Dead in generation 2 (kafka.coordinator.group.GroupMetadataManager)
[2021-10-15 02:04:23,654] WARN Client session timed out, have not heard from server in 15654ms for sessionid 0x10005a177990003 (org.apache.zookeeper.ClientCnxn)
[2021-10-15 02:05:35,005] INFO Client session timed out, have not heard from server in 15654ms for sessionid 0x10005a177990003, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn)

这些是来自动物园管理员的最后日志

[2021-10-15 02:04:54,812] INFO Expiring session 0x10005a177990003, timeout of 18000ms exceeded (org.apache.zookeeper.server.ZooKeeperServer)
[2021-10-15 02:05:28,649] INFO Expiring session 0x10005a177990002, timeout of 18000ms exceeded (org.apache.zookeeper.server.ZooKeeperServer)
[2021-10-15 02:07:27,106] WARN CancelledKeyException causing close of session 0x10005a177990002 (org.apache.zookeeper.server.NIOServerCnxn)
[2021-10-15 02:14:44,252] INFO Invalid session 0x10005a177990002 for client /172.18.0.3:36926, probably expired (org.apache.zookeeper.server.ZooKeeperServer)

我不能完全理解发生了什么,但看起来经纪人由于某种原因无法再与动物园管理员沟通。

下面是我的 docker-compose

version: '3.0'
services:
  zookeeper-1:
    image: confluentinc/cp-zookeeper:latest
    container_name: zookeeper
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
    ports:
      - 22181:2181

  kafka1:
    image: confluentinc/cp-kafka:latest
    container_name: kafka1
    depends_on:
      - zookeeper-1

    ports:
      - 29092:29092
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper-1:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka1:9092,PLAINTEXT_HOST://my-hostname-here:29092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_HEAP_OPTS: -Xmx512M -Xms512M

  kafka2:
    image: confluentinc/cp-kafka:latest
    container_name: kafka2
    depends_on:
      - zookeeper-1

    ports:
      - 29093:29093
    environment:
      KAFKA_BROKER_ID: 2
      KAFKA_ZOOKEEPER_CONNECT: zookeeper-1:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka2:9092,PLAINTEXT_HOST://my-hostname-here:29093
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_HEAP_OPTS: -Xmx512M -Xms512M

下面是容器的状态

【问题讨论】:

  • 我建议不要将 broker 的堆大小限制为仅 512m,但您需要提供停止的 broker 的日志
  • 确定我可以尝试增加服务器内存然后取消512限制,第一个代码sn-p是停止代理的日志,第二个是zookeeper的日志。
  • 您应该期待更多的警告和错误,实际上表明代理正在关闭,而不仅仅是 Zookeeper 客户端通信
  • 这就是我在 Zookeeper 和代理日志中关闭的所有内容
  • 那些看起来像是来自仍在运行的 kafka1 的日志,而不是 kafka2,然后

标签: docker apache-kafka apache-zookeeper


【解决方案1】:

docker ps 中,您会看到退出代码137

这是一个OOMKilled代码,这意味着容器需要更多内存。

建议你去掉KAFKA_HEAP_OPTS,让JVM限制在容器的全部可用内存空间

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2018-11-21
    • 2012-04-17
    • 2023-03-18
    • 1970-01-01
    • 2011-07-10
    • 2022-08-18
    • 1970-01-01
    • 2020-06-10
    相关资源
    最近更新 更多