【发布时间】:2021-10-01 08:14:26
【问题描述】:
如何使用 boto3 Python API 将大文件从 AWS S3 存储桶复制到另一个 S3 存储桶?如果我们使用 client.copy(),它会失败并抛出错误“调用 UploadPartCopy 操作时发生错误(InvalidArgument):指定的范围对于大小的源对象无效:”
【问题讨论】:
标签: amazon-web-services amazon-s3 boto3
如何使用 boto3 Python API 将大文件从 AWS S3 存储桶复制到另一个 S3 存储桶?如果我们使用 client.copy(),它会失败并抛出错误“调用 UploadPartCopy 操作时发生错误(InvalidArgument):指定的范围对于大小的源对象无效:”
【问题讨论】:
标签: amazon-web-services amazon-s3 boto3
根据 AWS S3 boto3 API 文档,我们应该使用分段上传。我用谷歌搜索了它,但找不到我的问题的明确、准确的答案。最后在彻底阅读boto3 api之后,我找到了我的问题的答案。这是答案。此代码也适用于多线程。
如果您使用多线程,请在每个线程中创建 s3_client。我测试了这种方法,可以完美地将大量 Terra 字节的数据从一个 S3 存储桶复制到不同的 s3 存储桶。
def get_session_client():
# session = boto3.session.Session(profile_name="default")
session = boto3.session.Session()
client = session.client("s3")
return session, client
def copy_with_multipart(local_s3_client, src_bucket, target_bucket, key, object_size):
current_thread_name = get_current_thread_name()
try:
initiate_multipart = local_s3_client.create_multipart_upload(
Bucket=target_bucket,
Key=key
)
upload_id = initiate_multipart['UploadId']
# 5 MB part size
part_size = 5 * 1024 * 1024
byte_position = 0
part_num = 1
parts_etags = []
while (byte_position < object_size):
# The last part might be smaller than partSize, so check to make sure
# that lastByte isn't beyond the end of the object.
last_byte = min(byte_position + part_size - 1, object_size - 1)
copy_source_range = f"bytes={byte_position}-{last_byte}"
# Copy this part
try:
info_log(f"{current_thread_name} Creating upload_part_copy source_range: {copy_source_range}")
response = local_s3_client.upload_part_copy(
Bucket=target_bucket,
CopySource={'Bucket': src_bucket, 'Key': key},
CopySourceRange=copy_source_range,
Key=key,
PartNumber=part_num,
UploadId=upload_id
)
except Exception as ex:
error_log(f"{current_thread_name} Error while CREATING UPLOAD_PART_COPY for key {key}")
raise ex
parts_etags.append({"ETag": response["CopyPartResult"]["ETag"], "PartNumber": part_num})
part_num += 1
byte_position += part_size
try:
response = local_s3_client.complete_multipart_upload(
Bucket=target_bucket,
Key=key,
MultipartUpload={
'Parts': parts_etags
},
UploadId=upload_id
)
info_log(f"{current_thread_name} {key} COMPLETE_MULTIPART_UPLOAD COMPLETED SUCCESSFULLY, response={response} !!!!")
except Exception as ex:
error_log(f"{current_thread_name} Error while CREATING COMPLETE_MULTIPART_UPLOAD for key {key}")
raise ex
except Exception as ex:
error_log(f"{current_thread_name} Error while CREATING CREATE_MULTIPART_UPLOAD for key {key}")
raise ex
调用多部分方法:
_, local_s3_client = get_session_client()
copy_with_multipart(local_s3_client, src_bucket_name, target_bucket_name, key, src_object_size)
【讨论】: