【发布时间】:2018-07-06 14:51:53
【问题描述】:
我正在使用 pyspark 使用 kafka 上的数据,我在控制台上输入以下内容以提交:
spark-submit --jars /Users/alexsun/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar comsumer.py
consumer.py 是我的 python 程序,然后在控制台中,它会引发:
________________________________________________________________________________________________
Spark Streaming's Kafka libraries not found in class path. Try one of the following.
1. Include the Kafka library and its dependencies with in the
spark-submit command as
$ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8:2.2.0 ...
2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-0-8-assembly, Version = 2.2.0.
Then, include the jar in the spark-submit command as
$ bin/spark-submit --jars <spark-streaming-kafka-0-8-assembly.jar> ...
________________________________________________________________________________________________
Traceback (most recent call last):
File "/Users/alexsun/PycharmProjects/untitled/spark_kafka/comsumer.py", line 51, in <module>
main()
File "/Users/alexsun/PycharmProjects/untitled/spark_kafka/comsumer.py", line 45, in main
main_main(ssc)
File "/Users/alexsun/PycharmProjects/untitled/spark_kafka/comsumer.py", line 29, in main_main
consumer = KafkaUtils.createStream(ssc, zookeeper, groupid, {kafkatopic: 1})
File "/Users/alexsun/binSoftware/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 69, in createStream
File "/Users/alexsun/binSoftware/spark-2.2.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 195, in _get_helper
似乎告诉我我没有指向jar文件的路径,但是我查看了日志信息,它有: enter image description here
18/01/27 19:46:59 INFO SparkContext: Added JAR file:/Users/alexsun/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar at spark://192.168.1.150:57342/jars/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar with timestamp 1517053619142
18/01/27 19:46:59 INFO SparkContext: Added file file:/Users/alexsun/PycharmProjects/untitled/spark_kafka/consumer.py at file:/Users/alexsun/PycharmProjects/untitled/spark_kafka/consumer.py with timestamp 1517053619150
我确定jar文件在那里,为什么会出现这个异常?
我不知道是什么问题,你能帮帮我吗?
【问题讨论】:
标签: python pyspark apache-kafka kafka-consumer-api