【发布时间】:2022-04-01 12:10:21
【问题描述】:
我正在尝试使用 PySpark 将数据写入 Redshift。 当我创建会话时,我可以从 Redshift 和 S3 读取文件,这正是我想要的。 但是,在尝试写回 Redshift 时出现错误。 你知道我应该在脚本中改变什么吗?
这是我定义会话的方式:
spark = SparkSession \
.builder \
.config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:2.7.0") \
.config("spark.driver.extraClassPath", ":".join(jars)) \
.config("spark.hadoop.fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem") \
.config("spark.hadoop.fs.s3a.awsAccessKeyId", os.environ['AWS_ACCESS_KEY_ID']) \
.config("spark.hadoop.fs.s3a.awsSecretAccessKey", os.environ['AWS_SECRET_ACCESS_KEY']) \
.config("spark.hadoop.fs.s3a.path.style.access", True)\
.config("com.amazonaws.services.s3.enableV4", True)\
.config("spark.executor.extraJavaOptions", "-Dcom.amazonaws.services.s3.enableV4=true")\
.config("spark.driver.extraJavaOptions", "-Dcom.amazonaws.services.s3.enableV4=true")\
.getOrCreate()
我可以从 Redshift 读取信息:
test = spark.read.format("jdbc").option("url","jdbc:redshift://host.redshift.amazonaws.com:5439/db").option("driver","com.amazon.redshift.jdbc42.Driver").option("dbtable","schema.table").option("user", DWH_DB_USER).option("password", DWH_DB_PASSWORD).load()
但在尝试运行时出错:
test.write.format("jdbc").option("url","jdbc:redshift://host.redshift.amazonaws.com:5439/db").option("driver","com.amazon.redshift.jdbc42.Driver").option("dbtable","schema.table").option("user", DWH_DB_USER).option("password", DWH_DB_PASSWORD).mode('append').save()
我遇到的错误是:
Py4JJavaError: An error occurred while calling o412.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): java.sql.SQLException: [Amazon](500310) Invalid operation: The session is read-only;
at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(Unknown Source)
at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse(Unknown Source)
at com.amazon.redshift.client.PGMessagingContext.handleMessage(Unknown Source)
at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(Unknown Source)
at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass(Unknown Source)
at com.amazon.
......
非常感谢!
【问题讨论】:
-
是的,我的问题是如何授予对 spark 的写访问权限。我可以在 python 中正常将数据从 S3 写入红移
-
您对 AWS SECRET KEYS 授予了什么样的权限?
-
@hackwithharsha 它具有管理员权限
标签: jdbc pyspark amazon-redshift write