【问题标题】:Kafka streams creating a simple materialized viewKafka 流创建一个简单的物化视图
【发布时间】:2019-07-02 05:09:01
【问题描述】:

我收到带有一堆非唯一字符串字段和事件时间戳的事件进入 Kafka。我想创建这些事件的物化视图,以便我可以查询它们。例如:

  1. 显示所有事件
  2. 显示field1 = some string 所在的所有事件
  3. 显示匹配多个字段的所有事件
  4. 显示两个日期之间的事件

我见过的所有示例都对流进行了聚合、连接或其他一些转换操作。我找不到一个简单的例子来创建一组事件的视图。我不想执行任何操作,我只想能够查询进入流的原始事件。

我使用的是Spring Kafka,所以Spring Kafka 的例子是理想的。

我能够将消息输入 Kafka 并使用它们。但是,我无法创建物化视图。

我有以下代码过滤事件(不是我真正想要的,我想要所有事件,但我只是想看看我是否可以获得物化视图):

@StreamListener
    public void process(@Input("input") KTable<String,MyMessage> myMessages) {
        keyValueStore = interactiveQueryService.getQueryableStore(ALL_MESSAGES,QueryableStoreTypes.keyValueStore());

        myMessages.filter((key,value) -> (value.getKey() != null));
                Materialized.<String,MyMessage,KeyValueStore<Bytes,byte[]>> as(ALL_MESSAGES)
                .withKeySerde(Serdes.String())
                .withValueSerde(new MyMessageSerde());

这是抛出异常:

java.lang.ClassCastException: [B cannot be cast to MyMessage
at org.apache.kafka.streams.kstream.internals.KTableFilter.computeValue(KTableFilter.java:57)
    at org.apache.kafka.streams.kstream.internals.KTableFilter.access$300(KTableFilter.java:25)
    at org.apache.kafka.streams.kstream.internals.KTableFilter$KTableFilterProcessor.process(KTableFilter.java:79)
    at org.apache.kafka.streams.kstream.internals.KTableFilter$KTableFilterProcessor.process(KTableFilter.java:63)
    at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50)
    at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244)
    at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
    at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
    at org.apache.kafka.streams.kstream.internals.ForwardingCacheFlushListener.apply(ForwardingCacheFlushListener.java:42)
    at org.apache.kafka.streams.state.internals.CachingKeyValueStore.putAndMaybeForward(CachingKeyValueStore.java:101)
    at org.apache.kafka.streams.state.internals.CachingKeyValueStore.access$000(CachingKeyValueStore.java:38)
    at org.apache.kafka.streams.state.internals.CachingKeyValueStore$1.apply(CachingKeyValueStore.java:83)
    at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:141)
    at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:99)
    at org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:125)
    at org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:123)
    at org.apache.kafka.streams.state.internals.InnerMeteredKeyValueStore.flush(InnerMeteredKeyValueStore.java:284)
    at org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore.flush(MeteredKeyValueBytesStore.java:149)
    at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:239)
    ... 21 more

我不明白为什么,因为我将 store 的 valueSerde 设置为 MyMessageSerde,它知道如何将 MyMessage 序列化/反序列化为字节数组。

更新

我把代码改成如下:

myMessages.filter((key,value) -&gt; (value.getKey() != null));

并将以下内容添加到我的 application.yml

spring.cloud.stream.kafka.streams.bindings.input:
  consumer:
    materializedAs: all-messages
    key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
    value-deserializer: MyMessageDeserializer  `

现在我得到以下堆栈跟踪:

Exception in thread "raven-a43f181b-ccb6-4d9b-a8fd-9fe96542c210-StreamThread-1" org.apache.kafka.streams.errors.ProcessorStateException: task [0_3] Failed to flush state store all-messages
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:242)
at org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:202)
at org.apache.kafka.streams.processor.internals.StreamTask.flushState(StreamTask.java:420)
at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:394)
at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:382)
at org.apache.kafka.streams.processor.internals.AssignedTasks$1.apply(AssignedTasks.java:67)
at org.apache.kafka.streams.processor.internals.AssignedTasks.applyToRunningTasks(AssignedTasks.java:362)
at org.apache.kafka.streams.processor.internals.AssignedTasks.commit(AssignedTasks.java:352)
at org.apache.kafka.streams.processor.internals.TaskManager.commitAll(TaskManager.java:401)
at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:1042)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:845)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Caused by: java.lang.ClassCastException: [B cannot be cast to MyMessage
at org.apache.kafka.streams.kstream.internals.KTableFilter.computeValue(KTableFilter.java:57)
at org.apache.kafka.streams.kstream.internals.KTableFilter.access$300(KTableFilter.java:25)
at org.apache.kafka.streams.kstream.internals.KTableFilter$KTableFilterProcessor.process(KTableFilter.java:79)
at org.apache.kafka.streams.kstream.internals.KTableFilter$KTableFilterProcessor.process(KTableFilter.java:63)
at org.apache.kafka.streams.processor.internals.ProcessorNode$1.run(ProcessorNode.java:50)
at org.apache.kafka.streams.processor.internals.ProcessorNode.runAndMeasureLatency(ProcessorNode.java:244)
at org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:133)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:143)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:126)
at org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:90)
at org.apache.kafka.streams.kstream.internals.ForwardingCacheFlushListener.apply(ForwardingCacheFlushListener.java:42)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.putAndMaybeForward(CachingKeyValueStore.java:101)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.access$000(CachingKeyValueStore.java:38)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore$1.apply(CachingKeyValueStore.java:83)
at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:141)
at org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:99)
at org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:125)
at org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:123)
at org.apache.kafka.streams.state.internals.InnerMeteredKeyValueStore.flush(InnerMeteredKeyValueStore.java:284)
at org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore.flush(MeteredKeyValueBytesStore.java:149)
at org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:239)
... 12 more`

【问题讨论】:

  • 请附上完整的堆栈跟踪/异常日志。我猜是 Object[] 参数。在这种情况下,您可能需要将 byte[] 摘要转换为 String 并将字符串用作参数。(假设)
  • 请添加完整的日志。有嵌套异常吗? ... 21 更多。 ?
  • 我在上次更新中添加了完整的堆栈跟踪
  • MyMessage 类型的默认密钥反序列化程序失败:Caused by: java.lang.ClassCastException: [B cannot be cast to MyMessage。您要么需要指定不同的键解/序列化器,要么需要添加一个额外的 map() 来将键类型从字节数组转换为 MyMessage。
  • 我的键类型是字符串,值是 MyMessage,所以如果有的话,它是不正确的值序列化程序。无论如何,我在哪里/如何指定这个不同的反序列化器?我在这里指定了序列化程序:

标签: java apache-kafka-streams spring-kafka


【解决方案1】:

您想要的查询类型不容易被支持。请注意,没有二级索引,但仅支持常规的基于键的查找和范围。

如果您预先了解所有查询,则可以将数据重新分组到派生的KTables 中,并将查询属性作为键。请注意,键必须是唯一的,因此,如果查询属性包含非唯一数据,则需要使用一些 Collection 类型作为值:

KTable originalTable = builder.table(...)
KTable keyedByFieldATable = originalTable.groupBy(/*select field A*/).aggregate(/* the aggregation return a list or similar of entries for the key*/);

请注意,每次重新键入原始表时都会重复存储要求。

作为替代方案,您可以对原始表进行全表扫描,并在使用返回的迭代器时评估您的过滤条件。

这是空间与 CPU 的权衡。也许 Kafka Streams 不是解决您问题的正确工具。

【讨论】:

    【解决方案2】:

    我能够按如下方式创建物化视图:

    application.yml 中的配置

    spring.cloud.stream.kafka.streams.bindings.input:
      consumer:
        materializedAs: all-messages
        keySerde: org.apache.kafka.common.serialization.Serdes$StringSerde
        valueSerde: com.me.MyMessageSerde
      producer:
        keySerde: org.apache.kafka.common.serialization.Serdes$StringSerde
        valueSerde: com.me.MyMessageSerde`  
    

    这会设置正确的序列化程序和实体化视图。

    以下代码使用上述配置创建 KTable,该 KTable 将视图具体化。

    public void process(@Input("input") KTable<String,MyMessage> myMessages) {
    }
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-11-04
      • 2021-05-04
      • 1970-01-01
      • 2014-04-25
      • 1970-01-01
      • 2023-02-08
      • 2015-02-11
      • 2012-11-16
      相关资源
      最近更新 更多