【发布时间】:2020-10-02 12:15:32
【问题描述】:
当我将以下依赖项添加到代码中时,会触发以下错误,
'--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0,org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.1.1'
下面是代码,
from pyspark.sql import SparkSession, Row
from pyspark.context import SparkContext
from kafka import KafkaConsumer
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0,org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.1.1'
sc = SparkContext.getOrCreate()
spark = SparkSession(sc)
df = spark \
.read \
.format("kafka") \
.option("kafka.bootstrap.servers", "localhost:9092") \
.option("subscribe", "Jim_Topic") \
.load()
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
以下是错误,
错误:缺少应用程序资源。
用法:spark-submit [选项] [应用程序参数] 用法:spark-submit --kill [提交ID] --master [spark://...] 用法:spark-submit --status [提交 ID] --master [spark://...] 用法:spark-submit run-example [options] example-class [example args]
【问题讨论】:
标签: apache-spark pyspark apache-kafka