【问题标题】:S3A hadoop aws jar always return AccessDeniedExceptionS3A hadoop aws jar 总是返回 AccessDeniedException
【发布时间】:2018-09-20 21:49:12
【问题描述】:

谁能帮我弄清楚为什么我会得到以下异常?我正在尝试从我的 spark 程序中的本地文件中读取一些数据并写入 S3。 我有这样指定的正确密钥和访问密钥 -

您认为这与某些库的版本不匹配有关吗?

    SparkConf conf = new SparkConf();
    // add more spark related properties

    AWSCredentials credentials = DefaultAWSCredentialsProviderChain.getInstance().getCredentials();

    conf.set("spark.hadoop.fs.s3a.access.key", credentials.getAWSAccessKeyId());
    conf.set("spark.hadoop.fs.s3a.secret.key", credentials.getAWSSecretKey());

java 代码是普通的 -

protected void process() throws JobException {
    JavaRDD<String> linesRDD = _sparkContext.textFile(_jArgs.getFileLocation());

    linesRDD.saveAsTextFile("s3a://my.bucket/" + Math.random() + "final.txt");

这是我的代码和 gradle。

Gradle

ext.libs = [
    aws:    [
        lambda: 'com.amazonaws:aws-lambda-java-core:1.2.0',
        // The AWS SDK will dynamically import the X-Ray SDK to emit subsegments for downstream calls made by your
        // function
        //recorderCore: 'com.amazonaws:aws-xray-recorder-sdk-core:1.1.2',
        //recorderCoreAwsSdk: 'com.amazonaws:aws-xray-recorder-sdk-aws-sdk:1.1.2',
        //recorderCoreAwsSdkInstrumentor: 'com.amazonaws:aws-xray-recorder-sdk-aws-sdk-instrumentor:1.1.2',
        // https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk
        javaSDK: 'com.amazonaws:aws-java-sdk:1.11.311',

        recorderSDK: 'com.amazonaws:aws-java-sdk-dynamodb:1.11.311',
        // https://mvnrepository.com/artifact/com.amazonaws/aws-lambda-java-events
        lambdaEvents: 'com.amazonaws:aws-lambda-java-events:2.0.2',
        snsSDK: 'com.amazonaws:aws-java-sdk-sns:1.11.311',
        // https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-emr
        emr :'com.amazonaws:aws-java-sdk-emr:1.11.311'

    ],
    //jodaTime: 'joda-time:joda-time:2.7',
    //guava   : 'com.google.guava:guava:18.0',
    jCommander : 'com.beust:jcommander:1.71',
    //jackson: 'com.fasterxml.jackson.module:jackson-module-scala_2.11:2.8.8',

    jackson: 'com.fasterxml.jackson.core:jackson-databind:2.8.0',
    apacheCommons: [
            lang3: "org.apache.commons:commons-lang3:3.3.2",
    ],
    spark: [
            core: 'org.apache.spark:spark-core_2.11:2.3.0',
            hadoopAws: 'org.apache.hadoop:hadoop-aws:2.8.1',
            //hadoopClient:'org.apache.hadoop:hadoop-client:2.8.1',
            //hadoopCommon:'org.apache.hadoop:hadoop-common:2.8.1',
            jackson: 'com.fasterxml.jackson.module:jackson-module-scala_2.11:2.8.8'
    ],

例外

2018-04-10 22:14:22.270 | ERROR |  |  |  |c.f.d.p.s.SparkJobEntry-46 
Exception found in job for file type : EMAIL
java.nio.file.AccessDeniedException: s3a://my.bucket/0.253592564392344final.txt: getFileStatus on 
s3a://my.bucket/0.253592564392344final.txt: 
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: 
Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 
62622F7F27793DBA; S3 Extended Request ID: BHCZT6BSUP39CdFOLz0uxkJGPH1tPsChYl40a32bYglLImC6PQo+LFtBClnWLWbtArV/z1SOt68=), S3 Extended Request ID: BHCZT6BSUP39CdFOLz0uxkJGPH1tPsChYl40a32bYglLImC6PQo+LFtBClnWLWbtArV/z1SOt68=
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:158) ~[hadoop-aws-2.8.1.jar:na]
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:101) ~[hadoop-aws-2.8.1.jar:na]
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1568) ~[hadoop-aws-2.8.1.jar:na]
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:117) ~[hadoop-aws-2.8.1.jar:na]
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1436) ~[hadoop-common-2.8.1.jar:na]
at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2040) ~[hadoop-aws-2.8.1.jar:na]
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131) ~[hadoop-mapreduce-client-core-2.6.5.jar:na]
at org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.assertConf(SparkHadoopWriter.scala:283) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:71) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1096) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1094) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) ~[spark-core_2.11-2.3.0.jar:2.3.0]
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) ~[spark-core_2.11-2.3.0.jar:2.3.0]

【问题讨论】:

  • 我遇到了类似的问题。对我来说,我没有访问 s3 文件夹的权限。访问问题解决后,我就可以轻松访问s3中的文件了。

标签: amazon-web-services apache-spark hadoop amazon-s3


【解决方案1】:

一旦你在使用 Hadoop Configuration 类,你需要去掉 spark.hadoop 前缀,所以只需使用 fs.s3a.access.key 等。

所有选项都在 org.apache.hadoop.fs.s3a.Constants 类中定义:如果您引用它们,您也可以避免拼写错误。

需要考虑的一点是 spark 和 hadoop 的所有来源都是公开的:没有什么可以阻止您获取堆栈跟踪、设置一些断点并尝试在您的 IDE 中运行它。当事情变得糟糕时,我们通常会这样做。

【讨论】:

  • 谢谢!我不是在玩 Hadoop conf,这就是为什么我有 spark.hadoop 前缀。这些凭证实际上被拾取然后失败。我已经调试过内部。这可能是一个权限问题,但我还没有弄清楚。我将在今天晚些时候发布我的发现。谢谢!
  • 我刚刚修改了我的问题,明确表明“conf”是一个 SparkConf 对象而不是 HadoopConfiguration。很抱歉造成混乱。
猜你喜欢
  • 2019-05-24
  • 2021-12-31
  • 1970-01-01
  • 1970-01-01
  • 2016-10-10
  • 2019-08-24
  • 1970-01-01
  • 2018-12-01
  • 1970-01-01
相关资源
最近更新 更多