【问题标题】:Issue while uploading last part in a multipart upload to S3将分段上传中的最后一部分上传到 S3 时出现问题
【发布时间】:2018-10-26 20:57:00
【问题描述】:

我在将分段上传中文件的最后一部分上传到 S3(boto3、python3.6)时遇到问题。 在我的代码下面:

mp_upload = s3_client.create_multipart_upload(Bucket=external_bucket, Key=audience_key)
mp_upload_id = mp_upload["UploadId"]
part_info = []
upload_content = []
byte_upload_size = 0
counter = 1
uploaded_once = False
FIVE_MEGABYTE = 5000000
for key in keys_to_aggregate:
        response = s3_client.get_object(Bucket=internal_bucket, Key=key)
        byte_file_size = response["ContentLength"]
        file_content = response["Body"].read().decode()

        byte_upload_size += byte_file_size
        upload_content.append(file_content)

        if byte_upload_size >= FIVE_MEGABYTE:
            # as soon as we reach the lower limit we upload
            logger.info(f"Uploading part {counter}")
            body = "".join(upload_content)
            body_with_header = f"{header}\n{body}".encode()
            part = s3_client.upload_part(Bucket=external_bucket,
                                         Key=audience_key,
                                         PartNumber=counter,
                                         UploadId=mp_upload_id,
                                         Body=body_with_header)

            part_info.append({"PartNumber": counter, "ETag": part["ETag"]})
            counter += 1
            # freeing up uploaded data
            byte_upload_size = 0
            upload_content = []
            uploaded_once = True

    if uploaded_once:
        # the last part can be less than 5MB so we need to upload it
        if byte_upload_size > 0:
            logger.info(f"Uploading last part for {job_id}")
            body = "".join(upload_content)
            body_with_header = f"{header}\n{body}".encode()
            part = s3_client.upload_part(Bucket=external_bucket,
                                         Key=audience_key,
                                         PartNumber=counter,
                                         UploadId=mp_upload_id,
                                         Body=body_with_header)

            part_info.append({"PartNumber": counter, "ETag": part["ETag"]})
            counter += 1

        s3_client.complete_multipart_upload(Bucket=external_bucket,
                                            Key=audience_key,
                                            UploadId=mp_upload_id,
                                            MultipartUpload={
                                                "Parts": part_info})
        logger.info(f"Multipart upload for {job_id} completed")
    else:
        # we didn't reach the 5MB threshold so no file was uploaded
        s3_client.abort_multipart_upload(Bucket=external_bucket,
                                         Key=audience_key,
                                         UploadId=mp_upload_id)

        # we proceed with a normal put
        body = "".join(upload_content)
        body_with_header = f"{header}\n{body}".encode()
        s3_client.put_object(Bucket=external_bucket, Key=audience_key,
                             Body=body_with_header)
        logger.info(f"Single file upload completed for {job_id}")

其中 keys_to_aggregate 是 S3 中的键列表。

问题出现在 if if byte_uploaded_size > 0 中,它检查要上传的最后一条数据。这段数据不到5MB,我的印象是你可以上传一个小于5MB的文件作为最后一部分。

由于某种原因,boto3 无法将最后一部分识别为最后一部分并抛出:Error while aggregating data from S3: An error occurred (EntityTooSmall) when calling the CompleteMultipartUpload operation: Your proposed upload is smaller than the minimum allowed size

我想不出将最后一次上传标记为 las 部分的方法。以前有人遇到过这个问题吗?

谢谢! 阿莱西奥

【问题讨论】:

  • 不需要将最后一部分标记为最后一部分。它可以按您的预期工作。事实上,如果只有一个部分并且一个部分(第一个和最后一个)小于 5M,它甚至可以工作。可能需要更多的日志记录,特别是在故障之前和故障点的每个部分的部件号和字节大小。
  • 也可能是FIVE_MEGABYTE = 5 * 1024 * 1024 (MiB),错误是前面的部分都太小了。
  • @Michael-sqlbot 你是对的。我已经更改了 FIVE_MEGABYTE 值,它现在可以工作了。谢谢!
  • Boto 可能已经吞下了部分异常,如果这是真的,那是不可原谅的糟糕设计。在网络上,S3 实际上会给你答案:<MinSizeAllowed>5242880</MinSizeAllowed>
  • 这是我唯一能看到的异常部分。非常感谢您的帮助@Michael-sqlbot

标签: python-3.x amazon-s3 boto3


【解决方案1】:

EntityTooSmall

您建议的上传小于允许的最小对象大小。每个部分的大小必须至少为 5 MB,最后一部分除外。

https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadComplete.html

从字里行间看,这个错误不是关于你的最后一部分——它是关于一个或多个前面的部分。

由此可见,最小部分大小实际上并不是 5 MB (5 × 1000 × 1000),而是实际上是 5 MiB (5 × 1024 × 1024)。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2015-12-28
    • 2020-08-24
    • 1970-01-01
    • 1970-01-01
    • 2019-11-06
    • 2011-06-09
    • 1970-01-01
    • 2013-03-29
    相关资源
    最近更新 更多