【发布时间】:2018-11-05 14:43:46
【问题描述】:
我有一个kafkaspout,2个bolts处理数据,2个bolts将处理后的数据存储在mongodb中
我正在使用 apache Flux 创建拓扑,我正在将数据从 kafka 读取到 spout 中。一切运行良好,但每次我运行拓扑时,它都会从一开始就处理 kafka 中的所有 msg。 并且一旦处理完所有的消息,它就不会等待更多的消息和崩溃。
我怎样才能让风暴拓扑只处理最新的消息。
这是我的拓扑文件 .yaml
name: "kafka-topology"
components:
# MongoDB mapper
- id: "block-mapper"
className: "org.apache.storm.mongodb.common.mapper.SimpleMongoMapper"
configMethods:
- name: "withFields"
args: # The following are the tuple fields to map to a MongoDB document
- ["block"]
# MongoDB mapper
- id: "transaction-mapper"
className: "org.apache.storm.mongodb.common.mapper.SimpleMongoMapper"
configMethods:
- name: "withFields"
args: # The following are the tuple fields to map to a MongoDB document
- ["transaction"]
- id: "stringScheme"
className: "org.apache.storm.kafka.StringScheme"
- id: "stringMultiScheme"
className: "org.apache.storm.spout.SchemeAsMultiScheme"
constructorArgs:
- ref: "stringScheme"
- id: "zkHosts"
className: "org.apache.storm.kafka.ZkHosts"
constructorArgs:
- "172.25.33.191:2181"
- id: "spoutConfig"
className: "org.apache.storm.kafka.SpoutConfig"
constructorArgs:
# brokerHosts
- ref: "zkHosts"
# topic
- "blockdata"
# zkRoot
- ""
# id
- "myId"
properties:
- name: "scheme"
ref: "stringMultiScheme"
- name: "ignoreZkOffsets"
value: flase
config:
topology.workers: 1
# ...
# spout definitions
spouts:
- id: "kafka-spout"
className: "org.apache.storm.kafka.KafkaSpout"
constructorArgs:
- ref: "spoutConfig"
parallelism: 1
# bolt definitions
bolts:
- id: "blockprocessing-bolt"
className: "org.apache.storm.flux.wrappers.bolts.FluxShellBolt"
constructorArgs:
# command line
- ["python", "process-bolt.py"]
# output fields
- ["block"]
parallelism: 1
# ...
- id: "transprocessing-bolt"
className: "org.apache.storm.flux.wrappers.bolts.FluxShellBolt"
constructorArgs:
# command line
- ["python", "trans-bolt.py"]
# output fields
- ["transaction"]
parallelism: 1
# ...
- id: "mongoBlock-bolt"
className: "org.apache.storm.mongodb.bolt.MongoInsertBolt"
constructorArgs:
- "mongodb://172.25.33.205:27017/testdb"
- "block"
- ref: "block-mapper"
parallelism: 1
# ...
- id: "mongoTrans-bolt"
className: "org.apache.storm.mongodb.bolt.MongoInsertBolt"
constructorArgs:
- "mongodb://172.25.33.205:27017/testdb"
- "transaction"
- ref: "transaction-mapper"
parallelism: 1
# ...
- id: "log"
className: "org.apache.storm.flux.wrappers.bolts.LogInfoBolt"
parallelism: 1
# ...
#stream definitions
# stream definitions define connections between spouts and bolts.
# note that such connections can be cyclical
# custom stream groupings are also supported
streams:
- name: "kafka --> block-Processing" # name isn't used (placeholder for logging, UI, etc.)
from: "kafka-spout"
to: "blockprocessing-bolt"
grouping:
type: SHUFFLE
- name: "kafka --> transaction-processing" # name isn't used (placeholder for logging, UI, etc.)
from: "kafka-spout"
to: "transprocessing-bolt"
grouping:
type: SHUFFLE
- name: "block --> mongo"
from: "blockprocessing-bolt"
to: "mongoBlock-bolt"
grouping:
type: SHUFFLE
- name: "transaction --> mongo"
from: "transprocessing-bolt"
to: "mongoTrans-bolt"
grouping:
type: SHUFFLE
我已尝试将属性添加到 spoutconfig 以仅像这样获取最新消息
- id: "spoutConfig"
className: "org.apache.storm.kafka.SpoutConfig"
constructorArgs:
- ref: "zkHosts"
- "blockdata"
- ""
- "myId"
properties:
- name: "scheme"
ref: "stringMultiScheme"
- name: "startOffsetTime"
ref: "EarliestTime"
- name: "forceFromStart"
value: false
但无论我在 startOffsetTime 的参考中放置什么,它都会产生 错误
Exception in thread "main" java.lang.IllegalArgumentException: Can not set long field org.apache.storm.kafka.KafkaConfig.startOffsetTime to null value
【问题讨论】: