【问题标题】:Even after setting "auto.offset.reset" to "latest" getting error OffsetOutOfRangeException即使在将“auto.offset.reset”设置为“latest”之后也会出现错误 OffsetOutOfRangeException
【发布时间】:2020-02-27 10:27:01
【问题描述】:

我使用 spark-sql-2.4.1 版本和 Kafka 0.10 v.

虽然我尝试按消费者消费数据。 即使将 "auto.offset.reset" 设置为 "latest"

,它也会在下面给出错误
org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {COMPANY_INBOUND-16=168}
    at org.apache.kafka.clients.consumer.internals.Fetcher.throwIfOffsetOutOfRange(Fetcher.java:348)
    at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:396)
    at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:999)
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)
    at org.apache.spark.sql.kafka010.InternalKafkaConsumer.fetchData(KafkaDataConsumer.scala:470)
    at org.apache.spark.sql.kafka010.InternalKafkaConsumer.org$apache$spark$sql$kafka010$InternalKafkaConsumer$$fetchRecord(KafkaDataConsumer.scala:361)
    at org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:251)
    at org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:234)
    at org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
    at org.apache.spark.sql.kafka010.InternalKafkaConsumer.runUninterruptiblyIfPossible(KafkaDataConsumer.scala:209)
    at org.apache.spark.sql.kafka010.InternalKafkaConsumer.get(KafkaDataConsumer.scala:234)

问题出在哪里?为什么设置不起作用?应该如何 修好了吗?

第 2 部分:

 .readStream()
                      .format("kafka")
                      .option("startingOffsets", "latest")
                      .option("enable.auto.commit", false)
                      .option("maxOffsetsPerTrigger", 1000)
                      .option("auto.offset.reset", "latest")
                      .option("failOnDataLoss", false)
                      .load();

【问题讨论】:

标签: apache-spark apache-kafka apache-spark-sql spark-streaming kafka-consumer-api


【解决方案1】:

auto.offset.reset 被 Spark Structured Streaming 忽略,改为使用 startingOffsets 选项

auto.offset.reset:设置源选项startingOffsets 来指定从哪里开始。结构化流式处理在内部管理哪些偏移量,而不是依赖 kafka Consumer 来完成。这将确保在动态订阅新主题/分区时不会丢失任何数据。请注意,startingOffsets 仅在启动新的流式查询时适用,并且恢复将始终从查询停止的位置开始。

Source

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-08-05
    • 2018-01-14
    • 2017-03-11
    • 1970-01-01
    • 2019-08-21
    相关资源
    最近更新 更多