【发布时间】:2021-10-27 09:50:32
【问题描述】:
我在 Google 的 K8S 上运行 3 个 KSQL 服务器,面向 Kafka(通过 ssl)和托管在 Google Cloud VM 上的 Zookeeper。我可以轻松创建 5 个流并且它们运行良好,但除此之外的所有内容都会给我带来各种超时。
KSQL kubernetes 配置(ksql vars 部分):
- name: KSQL_CONFIG_DIR
value: "/etc/ksqldb"
- name: KSQL_LOG4J_OPTS
value: "-Dlog4j.configuration=file:/etc/ksqldb/log4j.properties"
- name: KSQL_BOOTSTRAP_SERVERS
value: ***:9092,***:9092,***:9092
- name: KSQL_KSQL_INTERNAL_TOPIC_REPLICAS
value: "3"
- name: KSQL_KSQL_SCHEMA_REGISTRY_URL
value: "http://***"
- name: KSQL_HOST_NAME
value: prod-ksqldb-server
- name: KSQL_KSQL_SERVICE_ID
value: "prod-ksqldb-server"
- name: KSQL_LISTENERS
value: "http://0.0.0.0:8088"
- name: KSQL_CACHE_MAX_BYTES_BUFFERING
value: "0"
- name: KSQL_SECURITY_PROTOCOL
value: SSL
- name: KSQL_SSL_ENDPOINT_IDENTIFICATION_ALGORITHM
value: ""
- name: KSQL_SSL_TRUSTSTORE_LOCATION
value: /truststore/kafka.truststore.jks
- name: KSQL_SSL_TRUSTSTORE_PASSWORD
valueFrom:
secretKeyRef:
name: kafkassl
key: truststore_password
- name: KSQL_SSL_KEYSTORE_LOCATION
value: /keystore/kafkaconnect.keystore.jks
- name: KSQL_SSL_KEYSTORE_PASSWORD
valueFrom:
secretKeyRef:
name: kafkassl
key: keystore_password
我收到以下错误(当我尝试创建流、删除流或描述流扩展时随机出现一些错误):
Timeout while initializing transaction to the KSQL command topic.
If you're running a single Kafka broker, ensure that the following configs are set to 1 on the broker:
- transaction.state.log.replication.factor
- transaction.state.log.min.isr
- offsets.topic.replication.factor
Caused by: Timeout expired after 60000 milliseconds while awaiting
InitProducerId
Failed to guarantee existence of topic ABC
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting
for a node assignment. Call: createTopics
Caused by: Timed out waiting for a node assignment. Call: createTopics
Failed to Describe Kafka Topic(s): [source_topic]
Caused by: Timed out waiting to send the call. Call: describeTopics
ksql> describe ABC extended;
[2021-08-27 11:04:05,458] ERROR Failed to list Kafka consumer groups offsets
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting
for a node assignment. Call: listConsumerGroupOffsets
Caused by: Timed out waiting for a node assignment. Call:
listConsumerGroupOffsets (io.confluent.ksql.cli.console.Console:344)
这是我在 KSQL 日志中发现的:
2021-08-27 13:53:34.055 CEST
[2021-08-27 11:53:34,054] INFO Retrying request. Retry no: 0 Cause: 'org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: findCoordinator' (io.confluent.ksql.util.ExecutorUtil:95)
我们检查了 Kafka 日志、Zookeeper 日志(由于 getTopics 问题,我怀疑那里存在一些问题),都重新启动了。不知道出了什么问题。
正如我所说,Kafka 代理本身工作正常,Kafka Connect 也在工作,没有发生任何问题......
Github 问题:https://github.com/confluentinc/ksql/issues/7953
编辑:
-
检查过网络,似乎没问题。
ping 10.164.0.30 来自 10.164.0.30 的 64 个字节:seq=811 ttl=63 time=0.354 ms 来自 10.164.0.30 的 64 个字节:seq=812 ttl=63 time=0.277 ms ^C --- 10.164.0.30 ping 统计 --- 发送 813 个数据包,接收 813 个数据包,0% 数据包丢失 往返最小/平均/最大 = 0.157/0.275/1.549 毫秒
-
Java 内存为 3gb,根据 GKE 仪表板使用的内存为 1.5G。
另外,今天的情况:
我今天尝试创建流。得到回复:Statement written to command topic。当我运行SHOW STREAMS 时,我得到了
[2021-08-30 09:49:04,117] ERROR Timed out while waiting for a previous command to execute. command sequence number: 10 (io.confluent.ksql.cli.console.Console:344)
Error: command not executed since the server timed out while waiting for prior commands to finish executing.
If you wish to execute new commands without waiting for prior commands to finish, run the command 'request-pipelining ON'.
Timed out while waiting for a previous command to execute. command sequence number: 10
每次尝试后它都会出现。当我关闭 ksql-cli 会话并运行 show streams 时,错误不存在但我创建的流不存在(可能在后台创建?)
【问题讨论】:
-
尝试向容器中添加更多堆?也许创建一个 github 问题?
-
到 KSQL 容器?它没有限制。至于GH问题,我前段时间已经创建了,忘记链接github.com/confluentinc/ksql/issues/7953
-
@OneCricketeer java 有 3gb -Xmx,pod 上已用内存为 1.5GB(根据 GKE 仪表板)。 ``` free -h total used free shared buff/cache available Mem: 12Gi 3.9Gi 2.1Gi 2.0Mi 6.7Gi 8.6Gi Swap: 0B 0B 0B ``
标签: apache-kafka ksqldb