【问题标题】:pyspark 'SparkSession' object has no attribute '_jssc'pyspark 'SparkSession' 对象没有属性 '_jssc'
【发布时间】:2018-10-30 18:36:15
【问题描述】:

我正在使用: Hadoop 2.6.0-cdh5.14.2 SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101

从 KafkaUtils 启动 directStream 时出现此错误:

  File "/home/ale/amazon_fuse_ds/bin/hdp_amazon_fuse_aggreagation.py", line 91, in setupContexts
kafka_stream = KafkaUtils.createDirectStream( self.spark_streaming_context, [ self.kafka_topicin ], kafka_configuration )
  File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101/lib/spark2/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 145, in createDirectStream
 AttributeError: 'SparkSession' object has no attribute '_jssc'

我看到 SparkSession 有 _jsc 方法,但有 _jssc。

【问题讨论】:

    标签: apache-spark apache-kafka


    【解决方案1】:

    你传递的对象是SparkSession,为什么你应该传递StreamingContext

    from pyspark.streaming import StreamingContext
    
    ssc = StreaminContext(self.spark_streaming_context.sparkContext, batchDuration)
    KafkaUtils.createDirectStream(ssc, ...)
    

    【讨论】:

      猜你喜欢
      • 2017-01-24
      • 1970-01-01
      • 1970-01-01
      • 2019-05-17
      • 2018-03-31
      • 2019-12-13
      • 2017-04-30
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多