【问题标题】:pyspark kafka submit failedpyspark kafka 提交失败
【发布时间】:2018-07-06 14:51:53
【问题描述】:

我正在使用 pyspark 使用 kafka 上的数据,我在控制台上输入以下内容以提交:

spark-submit --jars /Users/alexsun/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar comsumer.py

consumer.py 是我的 python 程序,然后在控制台中,它会引发:

    ________________________________________________________________________________________________

  Spark Streaming's Kafka libraries not found in class path. Try one of the following.

  1. Include the Kafka library and its dependencies with in the
     spark-submit command as

     $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8:2.2.0 ...

  2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
     Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-0-8-assembly, Version = 2.2.0.
     Then, include the jar in the spark-submit command as

     $ bin/spark-submit --jars <spark-streaming-kafka-0-8-assembly.jar> ...

________________________________________________________________________________________________


Traceback (most recent call last):
  File "/Users/alexsun/PycharmProjects/untitled/spark_kafka/comsumer.py", line 51, in <module>
    main()
  File "/Users/alexsun/PycharmProjects/untitled/spark_kafka/comsumer.py", line 45, in main
    main_main(ssc)
  File "/Users/alexsun/PycharmProjects/untitled/spark_kafka/comsumer.py", line 29, in main_main
    consumer = KafkaUtils.createStream(ssc, zookeeper, groupid, {kafkatopic: 1})
  File "/Users/alexsun/binSoftware/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 69, in createStream
  File "/Users/alexsun/binSoftware/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 195, in _get_helper

似乎告诉我我没有指向jar文件的路径,但是我查看了日志信息,它有: enter image description here

    18/01/27 19:46:59 INFO SparkContext: Added JAR file:/Users/alexsun/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar at spark://192.168.1.150:57342/jars/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar with timestamp 1517053619142
18/01/27 19:46:59 INFO SparkContext: Added file file:/Users/alexsun/PycharmProjects/untitled/spark_kafka/consumer.py at file:/Users/alexsun/PycharmProjects/untitled/spark_kafka/consumer.py with timestamp 1517053619150

我确定jar文件在那里,为什么会出现这个异常?

我不知道是什么问题,你能帮帮我吗?

【问题讨论】:

    标签: python pyspark apache-kafka kafka-consumer-api


    【解决方案1】:

    对应pyspark的版本,这个一定要确定

    spark-submit --jars /Users/alexsun/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar comsumer.pynter code here
    

    spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jarenter code here
    

    必须使用相同的 pyspark 版本,在这种情况下您使用的是 pyspark=2.2.0

    另外一件事,我也遇到这个问题,但是我用--packages尝试的时候遇到了,也许你可以考虑使用

    --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:{version of pyspark}
    

    而不是--jar 选项

    【讨论】:

      猜你喜欢
      • 2019-07-30
      • 2017-12-26
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-06-29
      • 2018-11-28
      • 2022-10-04
      • 2020-10-03
      相关资源
      最近更新 更多