【发布时间】:2018-06-07 12:36:56
【问题描述】:
我正在尝试让 Spark 集群从 Amazon S3 云存储中读取数据源。这会导致以下错误,我需要一些帮助来诊断问题:
>>> sc.textFile("s3a://storage-bucket/s3test.txt").collect()
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 400, AWS Service: Amazon S3, AWS Request ID: D47397DA8BCB4669, AWS Error Code: null, AWS Error Message: Bad Request, S3 Extended Request ID: /aBi99tozgFEsdRGubDwhriMsNQvl1jLOf8AJquA8VXxzkpPL/LLCWDFQQvYn4snHx5gx66/pXo=
顺便说一句,这很好用:
$ aws s3 cp s3://storage-bucket/s3test.txt ./s3text.txt
download: s3://storage-bucket/s3test.txt to ./s3text.txt
$ cat s3text.txt
Hello S3
错误消息中的更多细节:
Caused by: org.jets3t.service.S3ServiceException: Service Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>SignatureDoe
sNotMatch</Code><Message>The request signature we calculated does not match the signature you provided. Check your key and signing method.</Message><AWSAccessKeyId>xxxxxxxxxxxxxxxxxx</AWSAccessKeyId><St
【问题讨论】:
-
@RameshMaharjan 收集结果?这并没有改变任何东西,并且错误消息指出了 S3 端的问题。
-
你能发布完整的错误日志吗?
-
为了获得更好的调试,您可以尝试像这样访问文件
aws s3 cp s3://storage-bucket/s3test.txt ./s3test.txt -
@destroy-everything 好主意,但这没有问题
-
您可以尝试使用
sc.parallelize([1,2,3]).collect()查看 S3 或您的 Spark 配置是否有问题?
标签: python apache-spark amazon-s3 pyspark cloud