【发布时间】:2018-10-26 20:57:00
【问题描述】:
我在将分段上传中文件的最后一部分上传到 S3(boto3、python3.6)时遇到问题。 在我的代码下面:
mp_upload = s3_client.create_multipart_upload(Bucket=external_bucket, Key=audience_key)
mp_upload_id = mp_upload["UploadId"]
part_info = []
upload_content = []
byte_upload_size = 0
counter = 1
uploaded_once = False
FIVE_MEGABYTE = 5000000
for key in keys_to_aggregate:
response = s3_client.get_object(Bucket=internal_bucket, Key=key)
byte_file_size = response["ContentLength"]
file_content = response["Body"].read().decode()
byte_upload_size += byte_file_size
upload_content.append(file_content)
if byte_upload_size >= FIVE_MEGABYTE:
# as soon as we reach the lower limit we upload
logger.info(f"Uploading part {counter}")
body = "".join(upload_content)
body_with_header = f"{header}\n{body}".encode()
part = s3_client.upload_part(Bucket=external_bucket,
Key=audience_key,
PartNumber=counter,
UploadId=mp_upload_id,
Body=body_with_header)
part_info.append({"PartNumber": counter, "ETag": part["ETag"]})
counter += 1
# freeing up uploaded data
byte_upload_size = 0
upload_content = []
uploaded_once = True
if uploaded_once:
# the last part can be less than 5MB so we need to upload it
if byte_upload_size > 0:
logger.info(f"Uploading last part for {job_id}")
body = "".join(upload_content)
body_with_header = f"{header}\n{body}".encode()
part = s3_client.upload_part(Bucket=external_bucket,
Key=audience_key,
PartNumber=counter,
UploadId=mp_upload_id,
Body=body_with_header)
part_info.append({"PartNumber": counter, "ETag": part["ETag"]})
counter += 1
s3_client.complete_multipart_upload(Bucket=external_bucket,
Key=audience_key,
UploadId=mp_upload_id,
MultipartUpload={
"Parts": part_info})
logger.info(f"Multipart upload for {job_id} completed")
else:
# we didn't reach the 5MB threshold so no file was uploaded
s3_client.abort_multipart_upload(Bucket=external_bucket,
Key=audience_key,
UploadId=mp_upload_id)
# we proceed with a normal put
body = "".join(upload_content)
body_with_header = f"{header}\n{body}".encode()
s3_client.put_object(Bucket=external_bucket, Key=audience_key,
Body=body_with_header)
logger.info(f"Single file upload completed for {job_id}")
其中 keys_to_aggregate 是 S3 中的键列表。
问题出现在 if if byte_uploaded_size > 0 中,它检查要上传的最后一条数据。这段数据不到5MB,我的印象是你可以上传一个小于5MB的文件作为最后一部分。
由于某种原因,boto3 无法将最后一部分识别为最后一部分并抛出:Error while aggregating data from S3: An error occurred (EntityTooSmall) when calling the CompleteMultipartUpload operation: Your proposed upload is smaller than the minimum allowed size。
我想不出将最后一次上传标记为 las 部分的方法。以前有人遇到过这个问题吗?
谢谢! 阿莱西奥
【问题讨论】:
-
不需要将最后一部分标记为最后一部分。它可以按您的预期工作。事实上,如果只有一个部分并且一个部分(第一个和最后一个)小于 5M,它甚至可以工作。可能需要更多的日志记录,特别是在故障之前和故障点的每个部分的部件号和字节大小。
-
也可能是
FIVE_MEGABYTE = 5 * 1024 * 1024(MiB),错误是前面的部分都太小了。 -
@Michael-sqlbot 你是对的。我已经更改了 FIVE_MEGABYTE 值,它现在可以工作了。谢谢!
-
Boto 可能已经吞下了部分异常,如果这是真的,那是不可原谅的糟糕设计。在网络上,S3 实际上会给你答案:
<MinSizeAllowed>5242880</MinSizeAllowed>。 -
这是我唯一能看到的异常部分。非常感谢您的帮助@Michael-sqlbot
标签: python-3.x amazon-s3 boto3