【问题标题】:Spark Kafka Data Consuming PackageSpark Kafka 数据消费包
【发布时间】:2021-09-04 23:37:06
【问题描述】:

我尝试使用文档中提到的以下代码来使用我的 kafka 主题:

df = spark \
  .readStream \
  .format("kafka") \
  .option("kafka.bootstrap.servers", "localhost:9092,") \
  .option("subscribe", "first_topic") \
  .load()
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")

我得到了错误:

AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".

所以我尝试了:

./bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2 ...

安装 kafka 包及其依赖项。但我收到此错误:

21/06/21 13:45:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" org.apache.spark.SparkException: Failed to get main class in JAR with error 'File file:/home/soheil/spark-3.1.2-bin-hadoop3.2/... does not exist'.  Please specify one with --class.
    at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:968)
    at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:486)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我应该怎么做才能安装这个包?

【问题讨论】:

    标签: apache-spark apache-kafka spark-structured-streaming spark-kafka-integration


    【解决方案1】:

    您在这里遇到的错误与 Kafka 无关

    file:/home/soheil/spark-3.1.2-bin-hadoop3.2/... does not exist

    这是在 Spark 所依赖的 PATH 上引用您的 HADOOP_HOME 和/或 HADOOP_CONF_DIR 变量。检查这些配置是否正确,并且您可以在运行自己的脚本之前运行Spark Structured Streaming WordCount examples that use Kafka

    $ bin/spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2 \
         structured_kafka_wordcount.py \
         host1:port1,host2:port2 subscribe topic1,topic2
    

    下一部分Please specify one with --class. 是说 CLI 解析器失败;可能是因为您输入了错误的 spark-submit 选项,或者您的文件路径中有空格,例如

    【讨论】:

      猜你喜欢
      • 2017-11-08
      • 2017-02-23
      • 1970-01-01
      • 2014-12-30
      • 2020-05-23
      • 2017-05-04
      • 2018-12-18
      • 2018-08-04
      • 1970-01-01
      相关资源
      最近更新 更多