【问题标题】:How to upload large archives to Amazon Glacier using PHP and aws-sdk v3?如何使用 PHP 和 aws-sdk v3 将大型档案上传到 Amazon Glacier?
【发布时间】:2015-10-16 08:31:20
【问题描述】:

这是我第一次使用来自亚马逊的任何东西。我正在尝试使用 PHP SDK V3 将多个文件上传到 Amazon Glacier。然后,亚马逊需要将这些文件合并为一个。

文件存储在 cPanel 的主目录中,必须通过 cron 作业上传到 Amazon Glacier。

我知道我必须使用上传多部分方法,但我不确定它需要哪些其他功能才能使其工作。我也不确定我计算和传递变量的方式是否正确。

这是我目前得到的代码:

<?php
require 'aws-autoloader.php';

use Aws\Glacier\GlacierClient;
use Aws\Glacier\TreeHash;

//############################################
//DEFAULT VARIABLES
//############################################
$key = 'XXXXXXXXXXXXXXXXXXXX';
$secret = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';   
$accountId = '123456789123';
$vaultName = 'VaultName';
$partSize = '4194304';
$fileLocation = 'path/to/files/';

//############################################
//DECLARE THE AMAZON CLIENT
//############################################
$client = new GlacierClient([
    'region' => 'us-west-2',
    'version' => '2012-06-01',
    'credentials' => array(
        'key'    => $key,
        'secret' => $secret,
  )
]);

//############################################
//GET THE UPLOAD ID
//############################################
$result = $client->initiateMultipartUpload([
    'partSize' => $partSize,
    'vaultName' => $vaultName
]);
$uploadId = $result['uploadId'];

//############################################
//GET ALL FILES INTO AN ARRAY
//############################################
$files = scandir($fileLocation);
unset($files[0]);
unset($files[1]);
sort($files);

//############################################
//GET SHA256 TREE HASH (CHECKSUM)
//############################################
$th = new TreeHash();
//GET TOTAL FILE SIZE
foreach($files as $part){
    $filesize = filesize($fileLocation.$part);
    $total = $filesize;
    $th = $th->update(file_get_contents($fileLocation.$part));
}
$totalchecksum = $th->complete();

//############################################
//UPLOAD FILES
//############################################
foreach ($files as $key => $part) {
    //HASH CONTENT
    $filesize = filesize($fileLocation.$part);
    $rangeSize = $filesize-1;
    $range = 'bytes 0-'.$rangeSize.'/*';
    $sourcefile = $fileLocation.$part;

    $result = $client->uploadMultipartPart([
        'accountId' => $accountId,
        'checksum' => '',
        'range' => $range,
        'sourceFile' => $sourcefile,
        'uploadId' => $uploadId,
        'vaultName' => $vaultName
    ]);
}

//############################################
//COMPLETE MULTIPART UPLOAD
//############################################
$result = $client->completeMultipartUpload([
    'accountId' => $accountId,
    'archiveSize' => $total,
    'checksum' => $totalchecksum,
    'uploadId' => $uploadId,
    'vaultName' => $vaultName,
]);
?>

似乎新 Glacier 客户端的声明正在工作,并且我确实收到了 UploadID,但如果我做得对,其余部分我不是 100%。文件需要上传到然后合并的 Amazon Glacier Vault 仍然是空的,我不确定这些文件是否只会显示 completeMultipartUpload 已成功执行的文件。

我在运行代码时也收到以下错误:

致命错误:未捕获的异常 带有消息“执行错误”的“Aws\Glacier\Exception\GlacierException” "CompleteMultipartUpload" on "https://glacier.us-west-2.amazonaws.com/XXXXXXXXXXXX/vaults/XXXXXXXXXX/multipart-uploads/cTI0Yfk6xBYIQ0V-rhq6AcdHqd3iivRJfyYzK6-NV1yn9GQvJyYCoSrXrrrx4kfyGm6m9PUEAq4M0x6duXm5MD8abn-M"; AWS HTTP 错误:客户端错误:403 InvalidSignatureException(客户端): 我们计算的请求签名与您的签名不符 假如。检查您的 AWS 秘密访问密钥和签名方法。咨询 有关详细信息,请参阅服务文档。为此的规范字符串 请求应该是 'POST /XXXXXXXXXXX/vaults/XXXXXXXXX/multipart-uploads/cTI0Yfk6xBYIQ0V-rhq6AcdHqd3iivRJfyYzK6-NV1yn9GQvJyYCoSrXrrrx4kfyGm6m9PUEAq4M0x6duXm5MD8abn-M 主机:glacier.us-west-2.amazonaws.com x-amz-archive-size:1501297 x-amz-日期:20151016T081455Z x-amz-glacier-版本:2012-06-01 x-amz-sha256-tree-hash:?[ qiuã°²åÁ¹ý+¤Üª¤ [;K×T 主机;x-amz-archive-size;x-amz-date;x-amz-glacier-version;x-am in /home/XXXXXXXXXXXX/public_html/XXXXXXXXXXX/Aws/WrappedHttpHandler.php 在第 152 行

是否有更简单的方法来做到这一点?如果有帮助的话,我也有完整的 SSH 访问权限。

【问题讨论】:

    标签: php amazon-web-services cpanel aws-sdk amazon-glacier


    【解决方案1】:

    我在 PHP SDK V3(第 3 版)中对此进行了管理,并且在我的研究中不断发现这个问题,所以我想我也会发布我的解决方案。使用风险自负,几乎没有错误检查或处理。

    <?php
    require 'vendor/autoload.php';
    
    use Aws\Glacier\GlacierClient;
    use Aws\Glacier\TreeHash;
    
    
    // Create the glacier client to connect with
    $glacier = new GlacierClient(array(
          'profile' => 'default',
          'region' => 'us-east-1',
          'version' => '2012-06-01'
          ));
    
    $fileName = '17mb_test_file';         // this is the file to upload
    $chunkSize = 1024 * 1024 * pow(2,2);  // 1 MB times a power of 2
    $fileSize = filesize($fileName);      // we will need the file size (in bytes)
    
    // initiate the multipart upload
    // it is dangerous to send the filename without escaping it first
    $result = $glacier->initiateMultipartUpload(array(
          'archiveDescription' => 'A multipart-upload for file: '.$fileName,
          'partSize' => $chunkSize,
          'vaultName' => 'MyVault'
          ));
    
    // we need the upload ID when uploading the parts
    $uploadId = $result['uploadId'];
    
    // we need to generate the SHA256 tree hash
    // open the file so we can get a hash from its contents
    $fp = fopen($fileName, 'r');
    // This class can generate the hash
    $th = new TreeHash();
    // feed in all of the data
    $th->update(fread($fp, $fileSize));
    // generate the hash (this comes out as binary data)...
    $hash = $th->complete();
    // but the API needs hex (thanks). PHP to the rescue!
    $hash = bin2hex($hash);
    
    // reset the file position indicator
    fseek($fp, 0);
    
    // the part counter
    $partNumber = 0;
    
    print("Uploading: '".$fileName
        ."' (".$fileSize." bytes) in "
        .(ceil($fileSize/$chunkSize))." parts...\n");
    while ($partNumber * $chunkSize < ($fileSize + 1))
    {
      // while we haven't written everything out yet
      // figure out the offset for the first and last byte of this chunk
      $firstByte = $partNumber * $chunkSize;
      // the last byte for this piece is either the last byte in this chunk, or
      // the end of the file, whichever is less
      // (watch for those Obi-Wan errors)
      $lastByte = min((($partNumber + 1) * $chunkSize) - 1, $fileSize - 1);
    
      // upload the next piece
      $result = $glacier->uploadMultipartPart(array(
            'body' => fread($fp, $chunkSize),  // read the next chunk
            'uploadId' => $uploadId,          // the multipart upload this is for
            'vaultName' => 'MyVault',
            'range' => 'bytes '.$firstByte.'-'.$lastByte.'/*' // weird string
            ));
    
      // this is where one would check the results for error.
      // This is left as an exercise for the reader ;)
    
      // onto the next piece
      $partNumber++;
      print("\tpart ".$partNumber." uploaded...\n");
    }
    print("...done\n");
    
    // and now we can close off this upload
    $result = $glacier->completeMultipartUpload(array(
      'archiveSize' => $fileSize,         // the total file size
      'uploadId' => $uploadId,            // the upload id
      'vaultName' => 'MyVault',
      'checksum' => $hash                 // here is where we need the tree hash
    ));
    
    // this is where one would check the results for error.
    // This is left as an exercise for the reader ;)
    
    
    // get the archive id.
    // You will need this to refer to this upload in the future.
    $archiveId = $result->get('archiveId');
    
    print("The archive Id is: ".$archiveId."\n");
    
    
    ?>
    

    【讨论】:

    • 这个对我有用,除了 TreeHash 部分。由于我们要上传大文件,因此将文件作为一个整体“读取”会破坏 MultiPart 上传的目的。我所做的是调用 addChecksum 方法,传入 MultiUploadPart 方法返回的校验和。这样,内存使用量就会保持在最低水平。
    【解决方案2】:

    我认为您误解了 uploadMultipartPart。 uploadMultipartPart 表示,您上传 1 个大文件,分多个部分。 然后执行 completeMultipartUpload 以标记您已完成上传一个文件。

    从您的代码看来,您正在上传多个文件。

    您可能实际上不需要使用uploadMultipartPart

    也许您可以使用常规的“uploadArchive”?

    参考:

    https://blogs.aws.amazon.com/php/post/Tx7PFHT4OJRJ42/Uploading-Archives-to-Amazon-Glacier-from-PHP

    【讨论】:

    【解决方案3】:

    注意:使用 aws-sdk-php v2 上传多部分的解决方案。我认为它可以在 v3 上运行,只需对类的使用进行少量更改 TreeHash.

    感谢snippet of Neil Vandermeiden,我已经完成了同样的任务,但有了一点改进。

    Neil 只对整个文件进行校验和验证。它有两个可能的问题:

    • 这可能会消耗内存:记住我们正在上传一个大文件;散列它以获得校验和,需要打开它并读取它的所有内容。
    • 我们正在上传多个文件部分:我们在上传某些部分时可能会遇到问题,最终导致 aws 上的文件部分损坏。如果我们计算并验证每个部分的每个校验和,我们就可以防止出现问题。

    在以下代码中,我们计算发送到 aws 的每个文件部分的校验和,并将它们中的每一个连同相关的文件部分发送到 aws api。

    一旦 aws 完成接收上传的部分,它就会执行它的校验和。如果校验和与我们的不匹配,则会引发异常。如果成功,我们确定该部分已成功上传。

    <?php
    use Aws\Common\Hash\TreeHash;
    use Aws\Glacier\GlacierClient;
    
    /**
     * upload a file and store it into aws glacier
     */
    class UploadMultipartFileToGlacier
    {
        // aws glacier
        private $description;
        private $glacierClient;
        private $glacierConfig;
        /*
         * it's a requirement the part size beingto be (1024 KB * 1024 KB) multiplied by any power of 2 (1MB, 2MB, 4MB, 8MB, and so on)
         * reference: https://docs.aws.amazon.com/aws-sdk-php/v2/api/class-Aws.Glacier.GlacierClient.html#_initiateMultipartUpload
         **/
        private $partSize;
    
        // file location
        private $filePath;
    
        private $errorMessage;
        private $executionDate;
    
        public function __construct($filePath)
        {
            $this->executionDate = date('Y-m-d H:i:s');
            $this->filePath = $filePath;
        
            // AWS Glacier
            $this->glacierConfig = (object) [
                'vaultId' => 'VAULT_NAME',
                'region' => 'REGION',
                'accessKeyId' => 'ACCESS_KEY',
                'secretAccessKey' => 'SECRET_KEY',
            ];
    
            $this->glacierClient = GlacierClient::factory(array(
                'credentials' => array(
                    'key'    => $this->glacierConfig->accessKeyId,
                    'secret' => $this->glacierConfig->secretAccessKey,
                ),
                'region' => $this->glacierConfig->region
            ));
    
            $this->description = sprintf('Upload file %s at %s', $this->filePath, $this->executionDate);
    
            $this->partSize = 1024 * 1024 * pow(2, 2); // 4 MB
        }
    
        public function upload()
        {
            list($success, $data) = $this->uploadFileToGlacier();
    
            if ($success) {
                // todo: tasks to do when file has upload successfuly to aws glacier
            } else {
                // todo: handle error
                // $this->errorMessage contains the exception message
            }
        }
    
        private function completeMultipartUpload($uploadId, $fileSize, $checksumParts)
        {
            // with all the chechsums of the processed file parts, we can compute the file checksum. It's important to send it as a parameter to the
            // aws api's GlacierClient::completeMultipartUpload. Aws compute on their side the checksum of the uploaded part. If
            // their checksum doesn't match ours, the api throws an exception.
            $checksum = $this->getChecksumFile($checksumParts);
    
            return $this->glacierClient->completeMultipartUpload([
                'archiveSize' => $fileSize,
                'uploadId' => $uploadId,
                'vaultName' => $this->glacierConfig->vaultId,
                'checksum' => $checksum
            ]);
        }
    
        private function getChecksumPart($content)
        {
            $treeHash = new TreeHash();
            $mb = 1024 * 1024 * pow(2, 0); // 1 MB (the class TreeHash only allows to process chunks <= 1 MB)
            $buffer = $content;
    
            while (strlen($buffer) >= $mb) {
                $data = substr($buffer, 0, $mb);
                $buffer = substr($buffer, $mb) ?: '';
                $treeHash->addData($data);
            }
            
            if (strlen($buffer)) {
                $treeHash->addData($buffer);
            }
    
            return $treeHash->getHash();
        }
    
        private function getChecksumFile($checksumParts)
        {
            $treeHash = TreeHash::fromChecksums($checksumParts);
    
            return $treeHash->getHash();
        }
    
        private function initiateMultipartUpload()
        {
            $result = $this->glacierClient->initiateMultipartUpload([
                'accountId' => '-',
                'vaultName' => $this->glacierConfig->vaultId,
                'archiveDescription' => $this->description,
                'partSize' => $this->partSize,
            ]);
    
            return $result->get('uploadId');
        }
    
        private function uploadFileToGlacier()
        {
            $success = true;
            $data = false;
    
            try {
                $fileSize = filesize($this->filePath);
    
                $uploadId = $this->initiateMultipartUpload();
                $checksums = $this->uploadMultipartFile($uploadId, $fileSize);
                $model = $this->completeMultipartUpload($uploadId, $fileSize, $checksums);
    
                $data = (object) [
                    'archiveId' => $model->get('archiveId'),
                    'executionDate' => $this->executionDate,
                    'location' => $model->get('location'),
                ];
            } catch (\Exception $e) {
                $this->errorMessage = $e->getMessage();
                $success = false;
            }
    
            return [$success, $data];
        }
        
        private function uploadMultipartFile($uploadId, $fileSize)
        {
            $numParts = ceil($fileSize / $this->partSize);
            $fp = fopen($this->filePath, 'r');
            $partIdx = 0;
            $checksumParts = [];
    
            error_log("Uploading: {$this->filePath} ({$fileSize} bytes) in {$numParts} parts...");
    
            while ($partIdx * $this->partSize < ($fileSize + 1)) {
                $firstByte = $partIdx * $this->partSize;
                $lastByte = min((($partIdx + 1) * $this->partSize) - 1, $fileSize - 1);
                $content = fread($fp, $this->partSize);
                
                // we compute the checksum of the part we're processing. It's important to send it as a parameter to the
                // aws api's GlacierClient::uploadMultipartPart. Aws compute on their side the checksum of the uploaded part. If
                // their checksum doesn't match ours, the api throws an exception.
                $checksumPart = $this->getChecksumPart($content);
    
                $result = $this->glacierClient->uploadMultipartPart([
                    'body' => $content,
                    'uploadId' => $uploadId,
                    'vaultName' => $this->glacierConfig->vaultId,
                    'checksum' => $checksumPart,
                    'range' => "bytes {$firstByte}-{$lastByte}/*"
                ]);
    
                $checksumParts[] = $result->get('checksum'); // same result as $checksumPart. It throws an exception if doesn't
                
                $partIdx++;
                error_log("Part {$partIdx} uploaded...");
            }
    
            return $checksumParts;
        }
    }
    
    $uploadMultipartFileToGlacier = new UploadMultipartFileToGlacier('<FILE_PATH>');
    
    $uploadMultipartFileToGlacier->upload();
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-12-21
      • 2018-01-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多