【发布时间】:2018-11-20 19:47:48
【问题描述】:
我正在按照 spark-redshift 教程从 redshift 读取到 spark(databricks)。我有以下代码:
val tempDir = "s3n://{my-s3-bucket-here}"
val jdbcUsername = "usernameExample"
val jdbcPassword = "samplePassword"
val jdbcHostname = "redshift.companyname.xyz"
val jdbcPort = 9293
val jdbcDatabase = "database"
val jdbcUrl = "sampleURL"
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "SAMPLEAWSKEY")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "SECRETKEYHERE")
val subs_dim = sqlContext.read.format("com.databricks.spark.redshift").option("url", jdbcUrl).option("tempdir", tempDir).option("dbtable", "example.exampledb").load()
现在,当我尝试运行它时,我得到:
java.lang.IllegalArgumentException: requirement failed: You must specify a method for authenticating Redshift's connection to S3 (aws_iam_role, forward_spark_s3_credentials, or temporary_aws_*. For a discussion of the differences between these options, please see the README.
我有点困惑,因为我使用 sc.hadoopConfiguration.set 定义了 awsAccesskeyID。我是公司的新人,所以我想知道 AWS 密钥是否错误,或者我是否遗漏了其他内容?
谢谢!
【问题讨论】:
-
您阅读自述文件了吗?它有什么亮点吗?
-
是的,我检查过了,它说要设置 AWS 凭证...我做了吗?
标签: scala apache-spark jdbc amazon-redshift