【问题标题】:PySpark Kafka Error: Missing application resourcePySpark Kafka 错误:缺少应用程序资源
【发布时间】:2020-10-02 12:15:32
【问题描述】:

当我将以下依赖项添加到代码中时,会触发以下错误,

'--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0,org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.1.1'

下面是代码,

from pyspark.sql import SparkSession, Row
from pyspark.context import SparkContext
from kafka import KafkaConsumer
import os

os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0,org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.1.1'


sc = SparkContext.getOrCreate()
spark = SparkSession(sc)

df = spark \
  .read \
  .format("kafka") \
  .option("kafka.bootstrap.servers", "localhost:9092") \
  .option("subscribe", "Jim_Topic") \
  .load()
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")

以下是错误,

错误:缺少应用程序资源。

用法:spark-submit [选项] [应用程序参数] 用法:spark-submit --kill [提交ID] --master [spark://...] 用法:spark-submit --status [提交 ID] --master [spark://...] 用法:spark-submit run-example [options] example-class [example args]

【问题讨论】:

    标签: apache-spark pyspark apache-kafka


    【解决方案1】:

    您还需要提供您的 python 文件的名称。

    os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0,org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.1.1 your_python_file.py'
    

    或者,更好的方法是:

    conf = SparkConf().set("spark.jars", "/path/to/your/jar")
    sc = SparkContext(conf=conf)
    

    【讨论】:

    • 我通过提供 jar 的路径来使用您的替代方式。正在低于错误To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 20/06/12 22:32:17 ERROR SparkContext: Failed to add /C:/Hadoop/Spark/spark-3.0.0-preview2-bin-hadoop2.7/bin/ to Spark environment java.lang.IllegalArgumentException: Directory C:\Hadoop\Spark\spark-3.0.0-preview2-bin-hadoop2.7\bin is not allowed for addJar
    猜你喜欢
    • 1970-01-01
    • 2020-01-16
    • 1970-01-01
    • 1970-01-01
    • 2021-08-02
    • 1970-01-01
    • 2015-10-04
    • 1970-01-01
    • 2019-05-10
    相关资源
    最近更新 更多