【问题标题】:Postgres Debezium CDC does not publish changes to KafkaPostgres Debezium CDC 不发布对 Kafka 的更改
【发布时间】:2020-05-02 09:25:27
【问题描述】:

我当前的测试配置如下:

version: '3.7'
services:
  postgres:
    image: debezium/postgres
    restart: always
    ports:
      - "5432:5432"
  zookeeper:
    image: debezium/zookeeper
    ports:
      - "2181:2181"
      - "2888:2888"
      - "3888:3888"
  kafka:
    image: debezium/kafka
    restart: always
    ports:
      - "9092:9092"
    links:
      - zookeeper
    depends_on:
      - zookeeper
    environment:
     - ZOOKEEPER_CONNECT=zookeeper:2181
     - KAFKA_GROUP_MIN_SESSION_TIMEOUT_MS=250
  connect:
    image: debezium/connect
    restart: always
    ports:
      - "8083:8083"
    links:
      - zookeeper
      - postgres
      - kafka
    depends_on:
      - zookeeper
      - postgres
      - kafka
    environment:
      - BOOTSTRAP_SERVERS=kafka:9092
      - GROUP_ID=1
      - CONFIG_STORAGE_TOPIC=my_connect_configs
      - OFFSET_STORAGE_TOPIC=my_connect_offsets
      - STATUS_STORAGE_TOPIC=my_source_connect_statuses

我像这样使用 docker-compose 运行它:

$ docker-compose up

而且我没有看到任何错误消息。似乎一切运行正常。如果我执行docker ps,我会看到所有服务都在运行。

为了检查 Kafka 是否正在运行,我在 Python 中创建了 Kafka 生产者和 Kafka 消费者:

# producer. I run it in one console window
from kafka import KafkaProducer
from json import dumps
from time import sleep

producer = KafkaProducer(bootstrap_servers=['localhost:9092'], value_serializer=lambda x: dumps(x).encode('utf-8'))

for e in range(1000):
    data = {'number' : e}
    producer.send('numtest', value=data)
    sleep(5)

# consumer. I run it in other colsole window

from kafka import KafkaConsumer
from json import loads

consumer = KafkaConsumer(
    'numtest',
     bootstrap_servers=['localhost:9092'],
     auto_offset_reset='earliest',
     enable_auto_commit=True,
     group_id='my-group',
     value_deserializer=lambda x: loads(x.decode('utf-8')))

for message in consumer:
     print(message)

而且效果非常好。我看到我的生产者如何发布消息,我看到它们在消费者窗口中是如何被消费的。

现在我想让 CDC 工作。首先,在 Postgres 容器中,我将postgres 角色密码设置为postgres

$ su postgres
$ psql
psql> \password postgres
Enter new password: postgres

然后我创建了一个新数据库test

psql> CREATE DATABASE test;

我创建了一个表:

psql> \c test;
test=# create table mytable (id serial, name varchar(128), primary key(id));

最后,我为我的 Debezium CDC 堆栈创建了一个连接器:

$ curl -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{
    "name": "test-connector",
    "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "tasks.max": "1",
    "plugin.name": "pgoutput",
    "database.hostname": "postgres",
    "database.port": "5432",
    "database.user": "postgres",
    "database.password": "postgres",
    "database.dbname" : "test",
    "database.server.name": "postgres",
    "database.whitelist": "public.mytable",
    "database.history.kafka.bootstrap.servers": "localhost:9092",
    "database.history.kafka.topic": "public.some_topic"
    }
}'

{"name":"test-connector","config":{"connector.class":"io.debezium.connector.postgresql.PostgresConnector","tasks.max":"1","plugin.name":"pgoutput","database.hostname":"postgres","database.port":"5432","database.user":"postgres","database.password":"postgres","database.dbname":"test","database.server.name":"postgres","database.whitelist":"public.mytable","database.history.kafka.bootstrap.servers":"localhost:9092","database.history.kafka.topic":"public.some_topic","name":"test-connector"},"tasks":[],"type":"source"}

如您所见,我的连接器创建时没有任何错误。现在我希望 Debezium CDC 发布对 Kafka 主题 public.some_topic 的所有更改。为了检查这一点,我创建了一个新的 Kafka 消费者:

from kafka import KafkaConsumer
from json import loads

consumer = KafkaConsumer(
    'public.some_topic',
     bootstrap_servers=['localhost:9092'],
     auto_offset_reset='earliest',
     enable_auto_commit=True,
     group_id='my-group',
     value_deserializer=lambda x: loads(x.decode('utf-8')))

for message in consumer:
     print(message)

与第一个示例的唯一区别是我正在观看public.some_topic。然后我转到数据库控制台并进行插入:

test=# insert into mytable (name) values ('Tom Cat');    
INSERT 0 1
test=#

因此,插入了一个新值,但我看到消费者窗口中没有发生任何事情。换句话说,Debezium 不会将事件发布到 Kafka public.some_topic。这有什么问题,我该如何解决?

【问题讨论】:

  • 1.如果查询连接器的状态,它是否还在运行? 2. Kafka Connect worker 日志中是否有任何内容显示连接器失败? 3. 我会使用kafkacat 来检查主题和生产/消费数据:)
  • @Robin Moffatt。如果我运行docker ps,我看到我的connect 服务正在运行。
  • @Robin Moffatt。我刚刚检查了连接器日志,发现有一行重复:INFO || WorkerSourceTask{id=test-connector2-0} flushing 0 outstanding messages for offset commit [org.apache.kafka.connect.runtime.WorkerSourceTask]
  • 你解决了吗,我试图运行你的 docker-compose 但我看到一些错误 onnect_1 | 2020-04-16 06:06:36,922 错误 || WorkerSourceTask{id=test-connector-0} 任务抛出未捕获且不可恢复的异常 [org.apache.kafka.connect.runtime.WorkerTask] connect_1 | io.debezium.jdbc.JdbcConnectionException: 错误: 语法错误 connect_1 |在 io.debezium.connector.postgresql.connection.PostgresReplicationConnection.initPublication(PostgresReplicationConnection.java:145)

标签: postgresql apache-kafka apache-kafka-connect debezium


【解决方案1】:

使用 Docker Compose 创建连接器时,我在 Kafka Connect 工作器日志中看到此错误:

Caused by: org.postgresql.util.PSQLException: ERROR: could not access file "pgoutput": No such file or directory
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2505)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2241)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:310)
        at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:447)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:368)
        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:309)
        at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:295)
        at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:272)
        at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:267)
        at io.debezium.connector.postgresql.connection.PostgresReplicationConnection.createReplicationSlot(PostgresReplicationConnection.java:288)
        at io.debezium.connector.postgresql.PostgresConnectorTask.start(PostgresConnectorTask.java:126)
        ... 9 more

如果您使用 Kafka Connect REST API 查询它,这也反映在任务的状态中:

curl -s "http://localhost:8083/connectors?expand=info&expand=status" | jq '."test-connector".status'
{
  "name": "test-connector",
  "connector": {
    "state": "RUNNING",
    "worker_id": "192.168.16.5:8083"
  },
  "tasks": [
    {
      "id": 0,
      "state": "FAILED",
      "worker_id": "192.168.16.5:8083",
      "trace": "org.apache.kafka.connect.errors.ConnectException: org.postgresql.util.PSQLException: ERROR: could not access file \"pgoutput\": No such file or directory\n\tat io.debezium.connector.postgresql.PostgresConnectorTask.start(PostgresConnectorTask.java:129)\n\tat io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:49)\n\tat org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:208)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\nCaused by: org.postgresql.util.PSQLException: ERROR: could not access file \"pgoutput\": No such file or directory\n\tat org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2505)\n\tat org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2241)\n\tat org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:310)\n\tat org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:447)\n\tat org.postgresql.jdbc.PgStatement.execute(PgStatement.java:368)\n\tat org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:309)\n\tat org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:295)\n\tat org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:272)\n\tat org.postgresql.jdbc.PgStatement.execute(PgStatement.java:267)\n\tat io.debezium.connector.postgresql.connection.PostgresReplicationConnection.createReplicationSlot(PostgresReplicationConnection.java:288)\n\tat io.debezium.connector.postgresql.PostgresConnectorTask.start(PostgresConnectorTask.java:126)\n\t... 9 more\n"
    }
  ],
  "type": "source"

你运行的 Postgres 版本是

postgres=# SHOW server_version;
 server_version
----------------
 9.6.16

pgoutput 仅适用于 >= 版本 10。

我将您的 Docker Compose 更改为使用版本 10:

image: debezium/postgres:10

在重新启动堆栈并按照您的说明进行操作后,我得到了一个正在运行的连接器:

curl -s "http://localhost:8083/connectors?expand=info&expand=status" | \
           jq '. | to_entries[] | [ .value.info.type, .key, .value.status.connector.state,.value.status.tasks[].state,.value.info.config."connector.class"]|join(":|:")' | \
           column -s : -t| sed 's/\"//g'| sort
source  |  test-connector  |  RUNNING  |  RUNNING  |  io.debezium.connector.postgresql.PostgresConnector

以及Kafka主题中的数据:

$ docker exec kafkacat kafkacat -b kafka:9092 -t postgres.public.mytable -C
{"schema":{"type":"struct","fields":[{"type":"struct","fields":[{"type":"int32","optional":false,"field":"id"},{"type":"string","optional":true,"field":"name"}],"optional":true,"name":"postgres.public.mytable.Value","field":"before"},{"type":"struct","fields":[{"type":"int32","optional":false,"field":"id"},{"type":"string","optional":true,"field":"name"}],"optional":true,"name":"postgres.public.mytable.Value","field":"after"},{"type":"struct","fields":[{"type":"string","optional":false,"field":"version"},{"type":"string","optional":false,"field":"connector"},{"type":"string","optional":false,"field":"name"},{"type":"int64","optional":false,"field":"ts_ms"},{"type":"string","optional":true,"name":"io.debezium.data.Enum","version":1,"parameters":{"allowed":"true,last,false"},"default":"false","field":"snapshot"},{"type":"string","optional":false,"field":"db"},{"type":"string","optional":false,"field":"schema"},{"type":"string","optional":false,"field":"table"},{"type":"int64","optional":true,"field":"txId"},{"type":"int64","optional":true,"field":"lsn"},{"type":"int64","optional":true,"field":"xmin"}],"optional":false,"name":"io.debezium.connector.postgresql.Source","field":"source"},{"type":"string","optional":false,"field":"op"},{"type":"int64","optional":true,"field":"ts_ms"}],"optional":false,"name":"postgres.public.mytable.Envelope"},"payload":{"before":null,"after":{"id":1,"name":"Tom Cat"},"source":{"version":"1.0.0.Final","connector":"postgresql","name":"postgres","ts_ms":1579172192292,"snapshot":"false","db":"test","schema":"public","table":"mytable","txId":561,"lsn":24485520,"xmin":null},"op":"c","ts_ms":1579172192347}}% Reached end of topic postgres.public.mytable [0] at offset 1

我将 kafkacat 添加到您的 Docker Compose 中:

  kafkacat:
    image: edenhill/kafkacat:1.5.0
    container_name: kafkacat
    entrypoint: 
      - /bin/sh 
      - -c 
      - |
        while [ 1 -eq 1 ];do sleep 60;done

编辑:保留以前的答案,因为它仍然有用且相关:

Debezium 会将消息写入topic based on the name of the table。在您的示例中,这将是postgres.test.mytable

这就是kafkacat有用的原因,因为你可以运行

kafkacat -b broker:9092 -L 

查看所有主题和分区的列表。找到主题后

kafkacat -b broker:9092 -t postgres.test.mytable -C

从中读取。

查看kafkacat 的详细信息,包括如何run it with Docker

还有一个演示all in action with Docker Compose here

【讨论】:

  • kafkacat 是一个很好的工具。我检查了它。它列出了 Kafka 中的许多主题。但我在那里看不到postgres.test.mytable 主题。我还尝试阅读主题postgres.test.mytable。但正如我所说,整个问题都在于Debezium CDC。它不会向 Kafka 发布消息。我不知道为什么。
  • kafkacat 列出了哪些主题? Debezium 可能正在写入具有不同名称排列的主题。还可以尝试使用新名称重新创建连接器,以防偏移量被修改。
  • kafkacat 打印此主题列表:“test”、“my_connect_configs”、“some_topic”、“public.some_topic”、“my_connect_offsets”、“__consumer_offsets”、“numtest”、“my_source_connect_statuses” .他们与mytable无关。
  • 我多次尝试重新创建连接器,但没有看到任何效果。可能我错过了配置文件中的某些内容,我将其附加到我的问题中。所以,调试连接器真的很麻烦。
  • 很好的答案!如果出于某种原因无法升级到 PG 10+,则必须使用“decoderbufs”或“wal2json”逻辑解码插件而不是“pgoutput”。
猜你喜欢
  • 2021-12-01
  • 1970-01-01
  • 2020-07-09
  • 2018-11-23
  • 2020-04-20
  • 2019-01-07
  • 2019-09-02
  • 2023-01-06
  • 2020-10-01
相关资源
最近更新 更多