【发布时间】:2018-05-11 16:44:01
【问题描述】:
这是this 的后续问题。 我设法做到了以下几点:
- 为我的 5 个代理 Kafka 集群创建一个无头服务,用于代理间通信
- 为每个代理设置一项服务
- 每个服务都有一个外部 ip
- 每个服务只选择一个 pod,例如服务“kafka-0-es”选择 pod“kafka-0”
- pod 正确地通告了它们各自的外部 IP。我通过访问 ZooKeeper CLI 上的数据验证了这一点。
我使用 zkCli 创建了一个主题 test-topic 并验证它已创建。之后,我开始了 Kafka 控制台生产者。
.\kafka-console-producer.bat --broker-list EXTERNAL_IP_1:9093,EXTERNAL_IP_2:9093,EXTERNAL_IP_3:9093,EXTERNAL_IP_4:9093,EXTERNAL_IP_5:9093 --topic test-topic --property parse.key=true --property key.
separator=:
>afkjdshasdkfjhsdkjsf:128379127893123
>[2018-05-09 17:35:51,622] WARN [Producer clientId=console-producer] Got error produce response with correlation id 9 on topic-partition test-topic-0, retrying (2 attempts left). Error: UNKNOWN_TOPIC_OR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
[2018-05-09 17:35:51,623] WARN [Producer clientId=console-producer] Received unknown topic or partition error in produce request on partition test-topic-0. The topic/partition may not exist or the user may not have Describe access to it (org.apache.kafka.clients.producer.internals.Sender)
[2018-05-09 17:35:51,649] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 10 : {test-topic=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2018-05-09 17:35:51,720] WARN [Producer clientId=console-producer] Got error produce response with correlation id 11 on topic-partition test-topic-0, retrying (1 attempts left). Error: UNKNOWN_TOPIC_OR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
[2018-05-09 17:35:51,720] WARN [Producer clientId=console-producer] Received unknown topic or partition error in produce request on partition test-topic-0. The topic/partition may not exist or the user may not have Describe access to it (org.apache.kafka.clients.producer.internals.Sender)
[2018-05-09 17:35:51,773] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 12 : {test-topic=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2018-05-09 17:35:51,823] WARN [Producer clientId=console-producer] Got error produce response with correlation id 13 on topic-partition test-topic-0, retrying (0 attempts left). Error: UNKNOWN_TOPIC_OR_PARTITION (org.apache.kafka.clients.producer.internals.Sender)
[2018-05-09 17:35:51,823] WARN [Producer clientId=console-producer] Received unknown topic or partition error in produce request on partition test-topic-0. The topic/partition may not exist or the user may not have Describe access to it (org.apache.kafka.clients.producer.internals.Sender)
[2018-05-09 17:35:51,913] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 14 : {test-topic=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2018-05-09 17:35:51,936] ERROR Error when sending message to topic test-topic with key: 20 bytes, value: 15 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.
[2018-05-09 17:35:51,945] WARN [Producer clientId=console-producer] Received unknown topic or partition error in produce request on partition test-topic-0. The topic/partition may not exist or the user may not have Describe access to it (org.apache.kafka.clients.producer.internals.Sender)
[2018-05-09 17:35:52,034] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 16 : {test-topic=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2018-05-09 17:35:52,161] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 20 : {test-topic=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2018-05-09 17:40:52,288] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 25 : {test-topic=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
根据 Zookeeper 的说法,我的 Kafka 代理“kafka-2”是该主题的领导者:
get /kafka/brokers/topics/test-topic/partitions/0/state
{"controller_epoch":5,"leader":2,"version":1,"leader_epoch":0,"isr":[2,1]}
但是 pod kafka-2 在日志中抛出错误
[2018-05-09 15:21:02,524] ERROR [ReplicaFetcherThread-0-2], Error for partition [test-topic,0] to broker 2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
不太清楚为什么会发生这种情况,配置看起来不错。为了让我的 Kafka 集群在 Kubernetes 上运行,我还缺少什么?
请注意,我也尝试过彻底擦除我的集群(缩小 kafka 集群,删除 kafka 存储,缩小 zk 集群,删除 zk 存储,扩大 zk,扩大 kafka)但无济于事。
【问题讨论】: