【问题标题】:How to Copy Large Files From AWS S3 bucket to another S3 buckets using boto3 Python API?如何使用 boto3 Python API 将大文件从 AWS S3 存储桶复制到另一个 S3 存储桶?
【发布时间】:2021-10-01 08:14:26
【问题描述】:

如何使用 boto3 Python API 将大文件从 AWS S3 存储桶复制到另一个 S3 存储桶?如果我们使用 client.copy(),它会失败并抛出错误“调用 UploadPartCopy 操作时发生错误(InvalidArgument):指定的范围对于大小的源对象无效:”

【问题讨论】:

    标签: amazon-web-services amazon-s3 boto3


    【解决方案1】:

    根据 AWS S3 boto3 API 文档,我们应该使用分段上传。我用谷歌搜索了它,但找不到我的问题的明确、准确的答案。最后在彻底阅读boto3 api之后,我找到了我的问题的答案。这是答案。此代码也适用于多线程。

    如果您使用多线程,请在每个线程中创建 s3_client。我测试了这种方法,可以完美地将大量 Terra 字节的数据从一个 S3 存储桶复制到不同的 s3 存储桶。

    获取 s3_client 的代码

    def get_session_client():
        # session = boto3.session.Session(profile_name="default")
        session = boto3.session.Session()
        client = session.client("s3")
        return session, client
    
    
    
        def copy_with_multipart(local_s3_client, src_bucket, target_bucket, key, object_size):
            current_thread_name = get_current_thread_name()
            try:
                initiate_multipart = local_s3_client.create_multipart_upload(
                    Bucket=target_bucket,
                    Key=key
                )
                upload_id = initiate_multipart['UploadId']
                # 5 MB part size
                part_size = 5 * 1024 * 1024
                byte_position = 0
                part_num = 1
    
                parts_etags = []
                
                while (byte_position < object_size):
                    #  The last part might be smaller than partSize, so check to make sure
                    #  that lastByte isn't beyond the end of the object.
                    last_byte = min(byte_position + part_size - 1, object_size - 1)
                    copy_source_range = f"bytes={byte_position}-{last_byte}"
                    # Copy this part
                    try:
                        info_log(f"{current_thread_name} Creating upload_part_copy source_range: {copy_source_range}")
                        response = local_s3_client.upload_part_copy(
                            Bucket=target_bucket,
                            CopySource={'Bucket': src_bucket, 'Key': key},
                            CopySourceRange=copy_source_range,
                            Key=key,
                            PartNumber=part_num,
                            UploadId=upload_id
                        )
                    except Exception as ex:
                        error_log(f"{current_thread_name} Error while CREATING UPLOAD_PART_COPY for key {key}")
                        raise ex
                    parts_etags.append({"ETag": response["CopyPartResult"]["ETag"], "PartNumber": part_num})
                    part_num += 1
                    byte_position += part_size
                try:
                    response = local_s3_client.complete_multipart_upload(
                        Bucket=target_bucket,
                        Key=key,
                        MultipartUpload={
                            'Parts': parts_etags
                        },
                        UploadId=upload_id
                    )
                    info_log(f"{current_thread_name} {key} COMPLETE_MULTIPART_UPLOAD COMPLETED SUCCESSFULLY, response={response} !!!!")
                except Exception as ex:
                    error_log(f"{current_thread_name} Error while CREATING COMPLETE_MULTIPART_UPLOAD for key {key}")
                    raise ex
            except Exception as ex:
                error_log(f"{current_thread_name} Error while CREATING CREATE_MULTIPART_UPLOAD for key {key}")
                raise ex
    

    调用多部分方法:

      _, local_s3_client = get_session_client()
     copy_with_multipart(local_s3_client, src_bucket_name, target_bucket_name, key, src_object_size)
    

    【讨论】:

      猜你喜欢
      • 2023-02-01
      • 2018-06-02
      • 2022-06-12
      • 2018-05-08
      • 2020-05-29
      • 1970-01-01
      • 2018-11-22
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多