【发布时间】:2016-06-04 07:12:00
【问题描述】:
我最近一直在尝试在 Pycharm 中调试 pyspark.streaming.kafka 类,以便与在 linux 机器上工作相比更容易排除故障。
这是我的示例代码:
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils, TopicAndPartition
sc = SparkContext(appName="sample app")
ssc = StreamingContext(sc, 1)
kafkaParams = {"metadata.broker.list": "{broker list}",
"auto.offset.reset": "smallest"}
kafka_stream = KafkaUtils.createDirectStream(ssc, {topic list}, kafkaParams)
但是,我收到以下错误:
Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm 5.0.3\helpers\pydev\pydevd.py", line 2411, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files (x86)\JetBrains\PyCharm 5.0.3\helpers\pydev\pydevd.py", line 1802, in run
launch(file, globals, locals) # execute the script
File "{script path}", line 30, in <module> {topic}], kafkaParams)
File "C:\spark-1.6.0-bin- hadoop2.6\python\lib\pyspark.zip\pyspark\streaming\kafka.py", line 152, in createDirectStream
py4j.protocol.Py4JJavaError: An error occurred while calling o20.loadClass.
: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Unknown Source)
16/02/22 11:45:49 INFO SparkContext: Invoking stop() from shutdown hook
如果有人可以就如何在 PyCharm 中调试 PySpark Kafka 流模块提供一些指导,我将不胜感激
【问题讨论】:
-
如何提交您的应用程序?
-
我只是在使用 Pycharm 调试功能。
-
它是否适用于其他配置?
-
如果我使用一些简单的 spark rdd 功能,它就可以工作。 pyspark 库中的 kafka 流类正在扔掉它
标签: apache-spark pycharm apache-kafka pyspark spark-streaming