【问题标题】:Cannot deserialize instance Kafka Streams无法反序列化实例 Kafka Streams
【发布时间】:2026-01-19 14:50:01
【问题描述】:

我做错了什么,我下面的 kafka 流程序在流式传输数据时出现问题,“无法反序列化 com.kafka.productiontest.models.TimeOff 的实例超出 START_ARRAY 令牌”。

我有一个主题 timeOffs2,其中包含带有 key timeOffID 的休假信息,值是包含 employeeId 的对象类型。我只想将所有休假时间分组为员工密钥并写信给商店。

对于存储键将是employeeId,值将是超时列表。

程序属性和流逻辑:

public Properties getKafkaProperties() throws UnknownHostException {

    InetAddress myHost = InetAddress.getLocalHost();

    Properties kafkaStreamProperties = new Properties();
    kafkaStreamProperties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
    kafkaStreamProperties.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass());
    kafkaStreamProperties.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, TimeOffSerde.class);
    kafkaStreamProperties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
    kafkaStreamProperties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "com.kafka.productiontest.models.TimeOffSerializer");
    kafkaStreamProperties.put(StreamsConfig.APPLICATION_ID_CONFIG, application_id );
    kafkaStreamProperties.put(StreamsConfig.APPLICATION_SERVER_CONFIG, myHost.getHostName() + ":" + port);
    return kafkaStreamProperties;
}



  String topic = "timeOffs2";
StreamsBuilder builder = new StreamsBuilder();

KStream<String, TimeOff> source = builder.stream(topic);

KTable<String, ArrayList<TimeOff>> newStore = source.groupBy((k, v) -> v.getEmployeeId())
    .aggregate(ArrayList::new,
        (key, value, aggregate) -> {
          aggregate.add(value);
          return aggregate;
        }, Materialized.as("NewStore").withValueSerde(TimeOffListSerde(TimeOffSerde)));

final Topology topology = builder.build();
final KafkaStreams streams = new KafkaStreams(topology, getKafkaProperties());

TimeOffSerializer.java

ackage com.kafka.productiontest.models;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.kafka.common.serialization.Serializer;

import java.util.Map;

public class TimeOffSerializer implements Serializer  {

  @Override
  public void configure(Map configs, boolean isKey) {

  }

  @Override
  public byte[] serialize(String topic, Object data) {
    byte[] retVal = null;
    ObjectMapper objectMapper = new ObjectMapper();
    try {
      retVal = objectMapper.writeValueAsString(data).getBytes();
    } catch (Exception e) {
      e.printStackTrace();
    }
    return retVal;
  }

  @Override
  public void close() {
  }
}

TimeOffDeserializer.java

package com.kafka.productiontest.models;

import com.fasterxml.jackson.databind.ObjectMapper;

import org.apache.kafka.common.serialization.Deserializer ;

import java.util.Map;

public class TimeOffDeserializer implements Deserializer {

  @Override
  public void configure(Map configs, boolean isKey) {

  }
  @Override
  public TimeOff deserialize(String arg0, byte[] arg1) {
    ObjectMapper mapper = new ObjectMapper();
    TimeOff timeOff = null;
    try {
      timeOff = mapper.readValue(arg1, TimeOff.class);
    } catch (Exception e) {
      e.printStackTrace();
    }
    return timeOff;
  }

  @Override
  public void close() {

  }

}

TimeOffSerde.java

package com.kafka.productiontest.models;

import org.apache.kafka.common.serialization.Deserializer;
import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.serialization.Serializer;

import java.util.Map;

public class TimeOffSerde implements Serde<Object> {

  private final Serde inner;

  public TimeOffSerde(){
    inner = Serdes.serdeFrom(new TimeOffSerializer(), new TimeOffDeserializer());
  }
  @Override
  public void configure(Map<String, ?> configs, boolean isKey) {
    inner.serializer().configure(configs, isKey);
    inner.deserializer().configure(configs, isKey);
  }

  @Override
  public void close() {
    inner.serializer().close();
    inner.deserializer().close();
  }

  @Override
  public Serializer<Object> serializer() {
    return inner.serializer();
  }

  @Override
  public Deserializer<Object> deserializer() {
    return inner.deserializer();
  }
}

TimeOffListSerializer.java

package com.kafka.productiontest.models;
import org.apache.kafka.common.serialization.Serializer;

import java.io.ByteArrayOutputStream;
import java.io.DataOutputStream;
import java.sql.Time;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.Map;

public class TimeOffListSerializer implements Serializer<ArrayList<TimeOff>> {

  private Serializer<TimeOff> inner;

  public TimeOffListSerializer(Serializer<TimeOff> inner) {
    this.inner = inner;
  }

  @Override
  public void configure(Map<String, ?> configs, boolean isKey) {

  }

  @Override
  public byte[] serialize(String topic, ArrayList<TimeOff> data) {
    final int size = data.size();
    final ByteArrayOutputStream baos = new ByteArrayOutputStream();
    final DataOutputStream dos = new DataOutputStream(baos);
    final Iterator<TimeOff> iterator = data.iterator();
    try {
      dos.writeInt(size);
      while (iterator.hasNext()) {
        final byte[] bytes = inner.serialize(topic, iterator.next());
        dos.writeInt(bytes.length);
        dos.write(bytes);
      }

    }catch (Exception ex) {

    }
    return baos.toByteArray();
  }

  @Override
  public void close() {
      inner.close();
  }
}

TimeOffListDeserializer.java

package com.kafka.productiontest.models;
import org.apache.kafka.common.serialization.Deserializer;

import java.io.ByteArrayInputStream;
import java.io.DataInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Map;

public class TimeOffListDeserializer  implements Deserializer<ArrayList<TimeOff>> {

  private final Deserializer<TimeOff> valueDeserializer;

  public TimeOffListDeserializer(final Deserializer<TimeOff> valueDeserializer) {
    this.valueDeserializer = valueDeserializer;
  }

  @Override
  public void configure(Map<String, ?> configs, boolean isKey) {

  }

  @Override
  public ArrayList<TimeOff> deserialize(String topic, byte[] data)  {
    if (data == null || data.length == 0) {
      return null;
    }

    final ArrayList<TimeOff> arrayList = new ArrayList<>();
    final DataInputStream dataInputStream = new DataInputStream(new ByteArrayInputStream(data));

    try {
      final int records = dataInputStream.readInt();
      for (int i = 0; i < records; i++) {
        final byte[] valueBytes = new byte[dataInputStream.readInt()];
        dataInputStream.read(valueBytes);
        arrayList.add(valueDeserializer.deserialize(topic, valueBytes));
      }
    } catch (IOException e) {
      throw new RuntimeException("Unable to deserialize ArrayList", e);
    }
    return arrayList;
  }

  @Override
  public void close() {

  }
}

TimeOffListSerde.java

package com.kafka.productiontest.models;

import org.apache.kafka.common.serialization.Deserializer;
import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.serialization.Serializer;

import java.util.ArrayList;
import java.util.Map;

public class TimeOffListSerde implements Serde<ArrayList<TimeOff>> {
  private Serde<ArrayList<TimeOff>> inner;

  public TimeOffListSerde() {
  }

  public TimeOffListSerde(Serde<TimeOff> serde){
    inner = Serdes.serdeFrom(new TimeOffListSerializer(serde.serializer()), new TimeOffListDeserializer(serde.deserializer()));
  }

  @Override
  public void configure(Map<String, ?> configs, boolean isKey) {
    inner.serializer().configure(configs, isKey);
    inner.deserializer().configure(configs, isKey);
  }

  @Override
  public void close() {
    inner.serializer().close();
    inner.deserializer().close();
  }

  @Override
  public Serializer<ArrayList<TimeOff>> serializer() {
    return inner.serializer();
  }

  @Override
  public Deserializer<ArrayList<TimeOff>> deserializer() {
    return inner.deserializer();
  }
}

我认为这部分的问题在于 withValueSerde。我不能用这段代码编译。但是如果我删除 withValueSerde,它会给我这个问题“无法反序列化 TimeOff 对象”。您能否帮助和指导我做错了什么。

KTable<String, ArrayList<TimeOff>> newStore = source.groupBy((k, v) -> v.getEmployeeId())
    .aggregate(ArrayList::new,
        (key, value, aggregate) -> {
          aggregate.add(value);
          return aggregate;
        }, Materialized.as("NewStore").withValueSerde(TimeOffListSerde(TimeOffSerde)));

【问题讨论】:

  • 如果您仍然在使用writeValueAsString,为什么不直接使用 StringSerializer 并将您的对象映射到字符串?那么,为什么要使用 DataOutputStream 而不是 Jackson 来将数据解析回对象呢?提示:为您的序列化程序编写单元测试
  • 会写测试用例,但是这种方法有什么问题。如果我要使用这段代码(即使它写得很脏),我做错了什么?
  • 在序列化管道的某处,它试图将您的列表反序列化为单个 TimeOff 对象

标签: java json apache-kafka apache-kafka-streams


【解决方案1】:

查看您的代码,我可以看到几个问题:

  1. TimeOffSerde - 它应该实现 Serde&lt;TimeOff&gt; 而不是 Serde&lt;Object&gt;
  2. 您没有在Materialized 中传递Key 和Value 的类型,因此假定它是Object

所以你的流媒体部分应该是这样的:

KTable<String, ArrayList<TimeOff>> newStore = source.groupBy((k, v) -> v.getEmployeeId())
        .aggregate(ArrayList::new,
                (key, value, aggregate) -> {
                    aggregate.add(value);
                    return aggregate;
                }, Materialized.<String, ArrayList<TimeOff>, KeyValueStore<Bytes, byte[]>>as("NewStore").withValueSerde(new TimeOffListSerde(new TimeOffSerde())));

注意:修改后请记住清除状态存储目录。

【讨论】:

  • 哇。天才。还有一个问题。是否有必要将值写入存储为 byte[] 。如果是,是否有任何具体原因以及我如何将 byte[] 转换回 list 对象。
  • 任何帮助。我是卡夫卡流的新手。这就是为什么问太多问题的原因。请仅回答这个问题如何将 byte[] 转换回 timeoff 对象列表。
  • @Parkashkumar,要将 byte[] 转换回 POJO(超时列表),您需要使用您的 Deserializer - TimeOffListDeserializer
  • @Parkashkumar,Kafka 将消息 - 键、值 - 视为字节数组,这就是为什么状态存储是 byte[]
  • 如果我使用 KTable(谈论源表)而不是 KStreams,它仍然是相同的代码吗?因为当我将 KStream 更改为 KTable 时,它​​开始出现问题。它说明了聚合的问题。