如何在不解码的情况下在 Python 中下载带有请求的 .gz 文件？答案

【问题标题】：How to download .gz files with requests in Python without decoding it?如何在不解码的情况下在 Python 中下载带有请求的 .gz 文件？
【发布时间】：2014-09-09 16:07:58
【问题描述】：

我正在使用requests下载文件：

import requests

req = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
    for chunk in req.iter_content(chunk_size=1024):
        if chunk:
            f.write(chunk)
            f.flush()

gzip 文件的问题是它们被请求自动解码，因此我在磁盘上获取解压文件，而我需要原始文件。

有没有办法告诉请求不要这样做？

【问题讨论】：

这是我在谷歌搜索“python requests gzip”时发现的："Requests automatically decompresses gzip-encoded responses ... You can get direct access to the raw response (and even the socket), if needed as well." 然后我在文档中搜索原始响应并找到requests.Response.raw;也许这就是你需要的？
这可能对stackoverflow.com/questions/18364193/…有帮助
您正确显示的代码为我下载了.gz 文件。你用的是什么服务器？ req.headers 的值是多少？您下载的 URL 是否公开可用，您可以与我们分享吗？
只是为了确定，您能告诉我们您是如何确定要在磁盘上获取解压文件的吗？
你能用url='https://wiki.mozilla.org/images/f/ff/Example.json.gz'和local_filename='Example.json.gz'试试你的代码吗？那还会自动解压吗？

标签： python download request

【解决方案1】：

import requests

r = requests.get(url, stream=True)
with open(local_filename, 'wb') as f:
    for chunk in r.raw.stream(1024, decode_content=False):
        if chunk:
            f.write(chunk)

这样，您将避免自动解压缩 gzip 编码的响应，将其保存到从 Web 服务器接收到的文件中，逐块保存。

【讨论】：

【解决方案2】：

正如上面 cmets 中所讨论的，这似乎已经解决了这个问题：

来自the docs for the requests module：

请求会自动解压缩 gzip 编码的响应...如果需要，您也可以直接访问原始响应（甚至是套接字）。

在文档中搜索“原始响应”会得到 requests.Response.raw，它会提供类似 file 的原始响应流表示。

【讨论】：