【问题标题】：zlib.error: Error -3 while decompressing: incorrect header checkzlib.error: 解压时出错 -3: 不正确的标头检查
【发布时间】：2011-03-08 12:29:07
【问题描述】：

我有一个 gzip 文件，我正在尝试通过 Python 读取它，如下所示：

import zlib

do = zlib.decompressobj(16+zlib.MAX_WBITS)
fh = open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
data = do.decompress(cdata)

它会抛出这个错误：

zlib.error: Error -3 while decompressing: incorrect header check

我该如何克服它？

【问题讨论】：

标签： python gzip zlib

【解决方案1】：

你有这个错误：

zlib.error: Error -3 while decompressing: incorrect header check

这很可能是因为您正在尝试检查不存在的标题，例如您的数据遵循RFC 1951（deflate 压缩格式）而不是RFC 1950（zlib 压缩格式）或RFC 1952（gzip 压缩格式）。

选择窗口位

但是zlib 可以解压所有这些格式：

要（去）压缩deflate 格式，使用wbits = -zlib.MAX_WBITS
要（去）压缩zlib 格式，使用wbits = zlib.MAX_WBITS
要（去）压缩gzip 格式，使用wbits = zlib.MAX_WBITS | 16

参见http://www.zlib.net/manual.html#Advanced 中的文档（inflateInit2 部分）

示例

测试数据：

>>> deflate_compress = zlib.compressobj(9, zlib.DEFLATED, -zlib.MAX_WBITS)
>>> zlib_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS)
>>> gzip_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16)
>>> 
>>> text = '''test'''
>>> deflate_data = deflate_compress.compress(text) + deflate_compress.flush()
>>> zlib_data = zlib_compress.compress(text) + zlib_compress.flush()
>>> gzip_data = gzip_compress.compress(text) + gzip_compress.flush()
>>>

zlib 的明显测试：

>>> zlib.decompress(zlib_data)
'test'

测试deflate：

>>> zlib.decompress(deflate_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(deflate_data, -zlib.MAX_WBITS)
'test'

测试gzip：

>>> zlib.decompress(gzip_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|16)
'test'

数据还兼容gzip模块：

>>> import gzip
>>> import StringIO
>>> fio = StringIO.StringIO(gzip_data)  # io.BytesIO for Python 3
>>> f = gzip.GzipFile(fileobj=fio)
>>> f.read()
'test'
>>> f.close()

自动标头检测（zlib 或 gzip）

将32 添加到windowBits 将触发标头检测

>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|32)
'test'
>>> zlib.decompress(zlib_data, zlib.MAX_WBITS|32)
'test'

改用`gzip`

或者你可以忽略zlib，直接使用gzip模块；但是please remember that under the hood、gzip 使用zlib。

fh = gzip.open('abc.gz', 'rb')
cdata = fh.read()
fh.close()

【讨论】：

这个：zlib.decompress(gzip_data, zlib.MAX_WBITS|32)
@dnozay，我试过使用上面提到的zlib.decompress(zlib_data, zlib.MAX_WBITS|32) 调整，但没有奏效。我仍然收到incorrect header check 错误。如果我尝试使用上面提到的其他选项，我仍然会收到各种错误。是否还有其他可能触发此错误的原因？
@Mnu，当然——任何既不是有效的 deflate、zlib 或 gzip 内容的数据都将无法通过标头检查。
zlib.MAX_WBITS | 16 为我工作，谢谢。从the documentation 中推断出这一点非常重要。另外，aiohttp 不能透明地解码 Content-Encoding: gzip 也很烦人。

【解决方案2】：

更新：dnozay's answer 解释了问题，应该是公认的答案。

试试gzip 模块，下面的代码直接来自python docs。

import gzip
f = gzip.open('/home/joe/file.txt.gz', 'rb')
file_content = f.read()
f.close()

【讨论】：

出现相同错误：回溯（最近一次调用最后）：文件“”，第 1 行，在文件“/usr/lib/python2.6/gzip.py”中，第 212 行，在读取 self._read(readsize) 文件“/usr/lib/python2.6/gzip.py”中，第 271 行，在 _read uncompress = self.decompress.decompress(buf) zlib.error: Error -3 while解压：无效的代码长度设置
@VarunVyas，抱歉，我无法重现您的错误。它必须与您的输入数据有关。你的输入文件是用 gzip 生成的吗？命令行中的gunzip是否正确解压？

【解决方案3】：

我刚刚解决了解压缩 gzip 数据时的“不正确的标头检查”问题。

您需要在调用 inflateInit2 时设置 -WindowBits => WANT_GZIP（使用 2 版本）

是的，这可能非常令人沮丧。对文档的典型浅读将 Zlib 视为 Gzip 压缩的 API，但默认情况下（不使用 gz* 方法）它不会创建或解压缩 Gzip 格式。你必须发送这个非非常显眼的标志。

【讨论】：

【解决方案4】：

要解压内存中不完整的 gzip 压缩字节，answer by dnozay 很有用，但它错过了我认为必要的 zlib.decompressobj 调用：

incomplete_decompressed_content = zlib.decompressobj(wbits=zlib.MAX_WBITS | 16).decompress(incomplete_gzipped_content)

请注意，zlib.MAX_WBITS | 16 是 15 | 16，即 31。有关wbits 的一些背景信息，请参阅zlib.decompress。

信用：answer by Yann Vernier 注意到zlib.decompressobj 调用。

【讨论】：

【解决方案5】：

这没有回答最初的问题，但它可能会帮助到这里的其他人。

zlib.error: Error -3 while decompressing: incorrect header check 也出现在以下示例中：

b64_encoded_bytes = base64.b64encode(zlib.compress(b'abcde'))
encoded_bytes_representation = str(b64_encoded_bytes)  # this the cause
zlib.decompress(base64.b64decode(encoded_bytes_representation))

这个例子是我在一些遗留 Django 代码中遇到的最小复制，其中Base64 编码字节（来自 HTTP POST）被存储在 Django CharField（而不是 BinaryField）中。

当从数据库中读取CharField 值时，str() 会在该值上调用，没有显式encoding，如Django source 所示。

str()documentation 说：

如果既没有给出编码也没有给出错误，str(object) 返回 object.str()，它是 object 的“非正式”或可很好打印的字符串表示。对于字符串对象，这是字符串本身。如果 object 没有 str() 方法，则 str() 回退到返回 repr(object)。

因此，在示例中，我们无意中进行了 base64 解码

"b'eJxLTEpOSQUABcgB8A=='"

而不是

b'eJxLTEpOSQUABcgB8A=='.

如果使用显式 encoding，则示例中的 zlib 解压缩将成功，例如str(b64_encoded_bytes, 'utf-8').

特定于 Django 的注意事项：

特别棘手的是：这个问题仅在检索数据库中的值时出现。例如，请参阅下面的测试，它通过（在 Django 3.0.3 中）：

class MyModelTests(TestCase):
    def test_bytes(self):
        my_model = MyModel.objects.create(data=b'abcde')
        self.assertIsInstance(my_model.data, bytes)  # issue does not arise
        my_model.refresh_from_db()
        self.assertIsInstance(my_model.data, str)  # issue does arise

MyModel 在哪里

class MyModel(models.Model):
    data = models.CharField(max_length=100)

【讨论】：

【解决方案6】：

有趣的是，我在尝试使用 Python 使用 Stack Overflow API 时遇到了这个错误。

我设法让它与 gzip 目录中的 GzipFile 对象一起工作，大致如下：

import gzip

gzip_file = gzip.GzipFile(fileobj=open('abc.gz', 'rb'))

file_contents = gzip_file.read()

【讨论】：

【解决方案7】：

我的案例是解压缩存储在 Bullhorn 数据库中的电子邮件。 sn-p如下：

import pyodbc
import zlib

cn = pyodbc.connect('connection string')
cursor = cn.cursor()
cursor.execute('SELECT TOP(1) userMessageID, commentsCompressed FROM BULLHORN1.BH_UserMessage WHERE DATALENGTH(commentsCompressed) > 0 ')



 for msg in cursor.fetchall():
    #magic in the second parameter, use negative value for deflate format
    decompressedMessageBody = zlib.decompress(bytes(msg.commentsCompressed), -zlib.MAX_WBITS)

【讨论】：

【解决方案8】：

只需添加标题 'Accept-Encoding': 'identity'

import requests

requests.get('http://gett.bike/', headers={'Accept-Encoding': 'identity'})

https://github.com/requests/requests/issues/3849

【讨论】：

所以你对一个解压问题的回答是：不要一开始就压缩它？？
服务器并不总是遵守规定的标头，因此这并不可靠。

选择窗口位

示例

自动标头检测（zlib 或 gzip）

改用gzip

改用`gzip`