【问题标题】:Not able to read data from Redshift using Spark-Scala无法使用 Spark-Scala 从 Redshift 读取数据
【发布时间】:2017-10-03 16:16:21
【问题描述】:

我正在尝试从 Amazon Redshift 读取数据,但出现以下错误:

Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: You must specify a method for authenticating Redshift's connection to S3 (aws_iam_role, forward_spark_s3_credentials, or temporary_aws_*. For a discussion of the differences between these options, please see the README.
at scala.Predef$.require(Predef.scala:224)
at com.databricks.spark.redshift.Parameters$MergedParameters.<init>(Parameters.scala:91)
at com.databricks.spark.redshift.Parameters$.mergeParameters(Parameters.scala:83)
at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:50)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)

使用以下代码读取数据:

 val session = SparkSession.builder()
    .master("local")
    .appName("POC")
    .getOrCreate()

  session.conf.set("fs.s3n.awsAccessKeyId", "<access_key>")
  session.conf.set("fs.s3n.awsSecretAccessKey", "<secret-key>")

  val eventsDF = session.read
    .format("com.databricks.spark.redshift")
    .option("url","<jdbc_url>" )
    .option("dbtable", "test.account")
    .option("tempdir", "s3n://testBucket/data")
    .load()
  eventsDF.show()

build.sbt:

name:= "Redshift_read"

scalaVersion:= "2.11.8"

version := "1.0"

val sparkVersion = "2.1.0"

    libraryDependencies ++= Seq(
      "org.apache.spark" %% "spark-core" % sparkVersion,
      "org.apache.spark" %% "spark-sql" % sparkVersion,
      "com.databricks" %% "spark-redshift" % "3.0.0-preview1",
      "com.amazonaws"     %   "aws-java-sdk"    % "1.11.0"
    )

谁能帮帮我,我错过了什么?我已经在 spark 中提供了访问密钥和密钥,但它仍然抛出错误。

【问题讨论】:

  • fs.s3n.awsAccessKeyId 等不是 SparkContext 配置的一部分,而不是 SparkSession 的一部分吗?
  • 你是对的。当我设置 sparkcontext 配置时,它起作用了。谢谢格兰特

标签: scala authentication apache-spark amazon-redshift


【解决方案1】:

我只是通过为 S3 键定义 SparkContext 设置而不是 SparkSession 来让它工作。

替换:

session.conf.set("fs.s3n.awsAccessKeyId", "<access_key>")
session.conf.set("fs.s3n.awsSecretAccessKey", "<secret-key>")

与:

session.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId,"<access_key>")
session.sparkContext.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "<secret_key>")

还在 build.sbt 中添加以下依赖项:

resolvers += "redshift" at "http://redshift-maven-repository.s3-website-us-east-1.amazonaws.com/release"

    "com.amazon.redshift" % "redshift-jdbc42" % "1.2.1.1001"

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-06-14
    • 2019-09-08
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-12-15
    相关资源
    最近更新 更多