【问题标题】:kafka readiness probes failing卡夫卡准备探测失败
【发布时间】:2018-11-27 18:37:57
【问题描述】:

我在 kubernetes 中部署了 kafka 和 zookeeper。如果我有 zookeeper 的就绪探测,我的 kafka 就绪探测会一直失败。如果我评论或删除 zookeeper 的就绪探测并再次部署,则 kafka 服务器启动没有任何问题(以及 kafka 就绪没有失败)。

这是 zookeeper 的就绪探测:-

readinessProbe:
  tcpSocket:
    port: 2181
  initialDelaySeconds: 20
  periodSeconds: 20
  timeoutSeconds: 5
  successThreshold: 1
  failureThreshold: 3

我的 Zookeeper 日志是

2018-06-18 11:27:24,863 [myid:0] - WARN  [SendWorker:5135603447292250196:QuorumCnxManager$SendWorker@951] - Send worker leaving thread
2018-06-18 11:27:24,864 [myid:0] - INFO  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager$Listener@743] - Received connection request /10.186.58.164:57728
2018-06-18 11:27:24,864 [myid:0] - WARN  [RecvWorker:1586112601866174465:QuorumCnxManager$RecvWorker@1025] - Connection broken for id 1586112601866174465, my id = 0, error = 
java.io.IOException: Received packet with invalid packet: -66911279
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1012)
2018-06-18 11:27:24,865 [myid:0] - WARN  [RecvWorker:1586112601866174465:QuorumCnxManager$RecvWorker@1028] - Interrupting SendWorker
2018-06-18 11:27:24,865 [myid:0] - WARN  [SendWorker:1586112601866174465:QuorumCnxManager$SendWorker@941] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2025)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2099)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:429)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1094)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:74)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:929)
2018-06-18 11:27:24,868 [myid:0] - WARN  [SendWorker:1586112601866174465:QuorumCnxManager$SendWorker@951] - Send worker leaving thread
2018-06-18 11:30:54,282 [myid:0] - INFO  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager$Listener@743] - Received connection request /10.186.58.164:47944
2018-06-18 11:31:39,342 [myid:0] - WARN  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager@461] - Exception reading or writing challenge: java.net.SocketException: Connection reset
2018-06-18 11:31:39,342 [myid:0] - INFO  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager$Listener@743] - Received connection request /10.186.58.164:47946
2018-06-18 11:31:39,342 [myid:0] - WARN  [RecvWorker:5135603447292250196:QuorumCnxManager$RecvWorker@1025] - Connection broken for id 5135603447292250196, my id = 0, error = 
java.io.IOException: Received packet with invalid packet: 1414541105
at        org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1012)
2018-06-18 11:31:39,343 [myid:0] - WARN  [RecvWorker:5135603447292250196:QuorumCnxManager$RecvWorker@1028] - Interrupting SendWorker
2018-06-18 11:31:39,343 [myid:0] - WARN  [SendWorker:5135603447292250196:QuorumCnxManager$SendWorker@941] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2025)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2099)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:429)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1094)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:74)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:929)
 2018-06-18 11:31:39,343 [myid:0] - WARN  [SendWorker:5135603447292250196:QuorumCnxManager$SendWorker@951] - Send worker leaving thread
2018-06-18 11:31:44,433 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /172.30.99.87:51010
2018-06-18 11:31:44,437 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /172.30.99.87:51012
2018-06-18 11:31:44,439 [myid:0] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2018-06-18 11:31:44,440 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /172.30.99.87:51012 (no session established for client)
2018-06-18 11:31:44,452 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /172.30.99.87:51014
2018-06-18 11:31:49,438 [myid:0] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2018-06-18 11:31:49,438 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /172.30.99.87:51010 (no session established for client)
2018-06-18 11:31:49,452 [myid:0] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2018-06-18 11:31:49,453 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /172.30.99.87:51014 (no session established for client)
2018-06-18 11:33:59,669 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /172.30.99.87:51148
2018-06-18 11:33:59,700 [myid:0] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2018-06-18 11:33:59,700 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /172.30.99.87:51148 (no session established for client)
2018-06-18 11:33:59,713 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /172.30.99.87:51150
2018-06-18 11:33:59,730 [myid:0] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2018-06-18 11:33:59,730 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /172.30.99.87:51150 (no session established for client)
2018-06-18 11:34:00,274 [myid:0] - INFO  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager$Listener@743] - Received connection request /10.186.58.164:48860
2018-06-18 11:34:00,275 [myid:0] - WARN  [RecvWorker:4616370699239609664:QuorumCnxManager$RecvWorker@1025] - Connection broken for id 4616370699239609664, my id = 0, error = 
java.io.IOException: Received packet with invalid packet: -1200847881
at  org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1012)
2018-06-18 11:34:00,275 [myid:0] - WARN  [RecvWorker:4616370699239609664:QuorumCnxManager$RecvWorker@1028] - Interrupting SendWorker
2018-06-18 11:34:00,275 [myid:0] - WARN  [SendWorker:4616370699239609664:QuorumCnxManager$SendWorker@941] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2025)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2099)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:429)
at   org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1094)
at   org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:74)
at  org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:929)
2018-06-18 11:34:00,276 [myid:0] - WARN  [SendWorker:4616370699239609664:QuorumCnxManager$SendWorker@951] - Send worker leaving thread
2018-06-18 11:34:00,277 [myid:0] - INFO  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager$Listener@743] - Received connection request /10.186.58.164:48862
2018-06-18 11:34:00,285 [myid:0] - WARN  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager@461] - Exception reading or writing challenge: java.net.SocketException: Connection reset
2018-06-18 11:40:10,712 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@215] - Accepted socket connection from /172.30.99.87:51522
2018-06-18 11:40:10,713 [myid:0] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@376] - Unable to read additional data from client sessionid 0x0, likely client has closed socket
2018-06-18 11:40:10,713 [myid:0] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1040] - Closed socket connection for client /172.30.99.87:51522 (no session established for client)
2018-06-18 11:40:10,782 [myid:0] - INFO  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager$Listener@743] - Received connection request /10.186.58.164:49556
2018-06-18 11:40:10,782 [myid:0] - WARN  [kafka1-zookeeper-0.kafka1-zookeeper/172.30.99.87:3888:QuorumCnxManager@461] - Exception reading or writing challenge: java.net.SocketException: Connection reset
2018-06-18 16:07:03,456 [myid:0] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
2018-06-18 16:07:03,459 [myid:0] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed

【问题讨论】:

    标签: apache-kafka kubernetes apache-zookeeper apache-kafka-connect


    【解决方案1】:

    你将你的 kafka 绑定到 zookeeper,这不是一个好的做法。

    我使用来自 apache 官方网站的 kafka,我发现有一个特定的脚本可以进行就绪探测:

    readinessProbe:
      exec:
        command:
        - sh
        - -c
        - "/opt/kafka/bin/kafka-broker-api-versions.sh --bootstrap-server=localhost:9092"
    

    【讨论】:

      【解决方案2】:

      我有类似的问题。跟随变化,帮助我克服了这个问题。

      # readinessProbe & livenessProbe 
        readinessProbe:
          tcpSocket:
            port: 9092
          timeoutSeconds: 5
          periodSeconds: 5
          initialDelaySeconds: 45
        livenessProbe:
          exec:
            command:
            - sh
            - -c
            - "kafka-broker-api-versions.sh --bootstrap-server=localhost:9092"
          timeoutSeconds: 5
          periodSeconds: 5
          initialDelaySeconds: 60
      

      根据您的要求,您可以更新以下值。

      initialDelaySeconds

      【讨论】:

        猜你喜欢
        • 2020-10-23
        • 2018-11-21
        • 2017-06-12
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-03-03
        相关资源
        最近更新 更多