如何gzip压缩tmp文件夹中的文件答案

【问题标题】：how to gzip files in tmp folder如何gzip压缩tmp文件夹中的文件
【发布时间】：2021-12-06 07:54:15
【问题描述】：

使用 AWS Lambda 函数，我下载了一个 S3 压缩文件并将其解压缩。

现在我使用extractall 来做这件事。解压后，所有文件都保存在tmp/文件夹中。

s3.download_file('test','10000838.zip','/tmp/10000838.zip')

with zipfile.ZipFile('/tmp/10000838.zip', 'r') as zip_ref:
    lstNEW = list(filter(lambda x: not x.startswith("__MACOSX/"), zip_ref.namelist()))
    zip_ref.extractall('/tmp/', members=lstNEW)

解压后，我想gzip文件并将它们放在另一个S3存储桶中。

现在，我怎样才能再次读取tmp 文件夹中的所有文件并压缩每个文件？ $item.csv.gz

我看到了这个 (https://docs.python.org/3/library/gzip.html)，但我不确定要使用哪个函数。

如果是压缩功能，我该如何使用呢？我在这个答案gzip a file in Python 中读到我可以使用打开函数gzip.open('', 'wb') 来压缩文件，但我不知道如何在我的情况下使用它。在 open 函数中，我是指定目标位置还是源位置？我在哪里保存 gzip 压缩的文件，以便我以后可以将它们保存到 S3？

备选方案：

我没有将所有内容都加载到 tmp 文件夹中，而是读到我还可以打开一个输出流，将输出流包装在 gzip 包装器中，然后从一个流复制到另一个流

with zipfile.ZipFile('/tmp/10000838.zip', 'r') as zip_ref:
    testList = []
    for i in zip_ref.namelist():
        if (i.startswith("__MACOSX/") == False):
            testList.append(i)
    for i in testList:
        zip_ref.open(i, ‘r’)

但话又说回来，我不确定如何在 for 循环中继续并打开流并在那里转换文件

【问题讨论】：

标签： python python-3.x zip gzip tmp

【解决方案1】：

根据文件的大小，我会跳过将 .gz 文件写入磁盘。也许基于s3fs | boto 和 gzip。

import contextlib
import gzip

import s3fs

AWS_S3 = s3fs.S3FileSystem(anon=False) # AWS env must be set up correctly

source_file_path = "/tmp/your_file.txt"
s3_file_path = "my-bucket/your_file.txt.gz"

with contextlib.ExitStack() as stack:
    source_file = stack.enter_context(open(source_file_path , mode="rb"))
    destination_file = stack.enter_context(AWS_S3.open(s3_file_path, mode="wb"))
    destination_file_gz = stack.enter_context(gzip.GzipFile(fileobj=destination_file))
    while True:
        chunk = source_file.read(1024)
        if not chunk:
            break
        destination_file_gz.write(chunk)

注意：我没有对此进行测试，所以如果它不起作用，请告诉我。

【讨论】：

你能看看这里吗？ stackoverflow.com/questions/69706267/…