Python 3：提取 tar.gz 存档而不写入磁盘答案

【问题标题】：Python 3: Extract tar.gz archive without writing to diskPython 3：提取 tar.gz 存档而不写入磁盘
【发布时间】：2023-03-08 08:12:02
【问题描述】：

正如标题所说，有没有一种方法可以在不将文件写入磁盘的情况下提取 tar.gz 存档（存档是从 Internet 下载的）。在 bash 或任何其他 shell 中，我可以将 curl 或 wget 的输出通过管道传输到 tar：

curl -L "https://somewebsite.com/file.tar.gz" | tar xzf -

我也可以在 python 中做这样的事情吗？

编辑：我正在使用 urllib 下载数据。我目前正在做这样的事情来下载并写入文件：

from urllib.request import urlopen

filename = "/home/bob/file.tar.gz"
url      = "https://website.com/file.tar.gz"

file = open(filename, "wb")
file.write(urlopen(url).read())
file.close

【问题讨论】：

os.system 使用该命令行将是最简单的。标准库中还有“tarfile”模块docs.python.org/3/library/tarfile.html
我已经检查了 tarfile 模块，但我只能看到如何提取已下载或存在于文件系统中的存档。所以我想我必须使用 shell 命令然后
这里没有代码让我更正，但请注意，fileobj 参数可以是实现 read() 的 python 对象

标签： python tar

【解决方案1】：

在 kenny 评论的帮助下，我通过解析从 urlopen 获得的数据、使用 BytesIO 并将其用作 tarfile.open 的 fileobj 参数来完成我想做的事情：

from urllib.request import urlopen
import tarfile
from io import BytesIO

r = urlopen("https://url/file.tar.gz")
t = tarfile.open(name=None, fileobj=BytesIO(r.read()))
t.extractall("/somedirectory/")
t.close()

【讨论】：

【解决方案2】：

无需将 TAR 文件写入磁盘，您可以使用 python subprocess 模块为您运行 shell 命令：

import subprocess

# some params
shell_cmd = 'curl -L "https://somewebsite.com/file.tar.gz" | tar xzf -'
i_trust_this_string_cmd = True
throw_error_on_fail = True
timeout_after_seconds = 10 # or None
convert_output_from_bytes_to_string = True
#

# run shell as subprocesses to this one and get results
cp = subprocess.run(
    [shell_cmd],
    shell=i_trust_this_string_cmd,
    check=throw_error_on_fail,
    timeout=timeout_after_seconds,
    text=convert_output_from_bytes_to_string
)

#status_code = cp.returncode

try:
    cp.check_returncode() # triggers exceptions if errors occurred
    print(cp.stdout) # if you want to see the output (text in this case)
except subprocess.CalledProcessError as cpe:
    print(cpe)
except subprocess.TimeoutExpired as te:
    print(te)

如果你想要更多的控制，你可以提供一个管道，比如 STDOUT，STDERR，例如

with open('/tmp/stdout.txt', 'w+') as stdout:
    with open('/tmp/stderr.txt', 'w+') as stderr:
        cp = subprocess.run([...], stdout=stdout, stderr=stderr)
        ...

【讨论】：