【问题标题】:How do I gzip compress a string in Python?如何在 Python 中 gzip 压缩字符串?
【发布时间】:2012-01-20 08:53:03
【问题描述】:

如何在 Python 中 gzip 压缩字符串?

gzip.GzipFile 存在,但这是针对文件对象的 - 纯字符串呢?

【问题讨论】:

  • @KevinDTimm,那个文档只提到了StringIO,但并没有真正解释如何去做。所以在这里问这个问题是完全有效的,恕我直言。不过,在询问和告诉我们之前进行更多试验会很好。
  • @Alfe - 这个问题在 4 年前被关闭,原因与我的评论大致相同 - OP 没有努力先搜索。
  • 这怎么跑题了?
  • 这个问题是谷歌现在gzip string in python 的热门问题,并且是非常合理的 IMO。它应该重新打开。
  • 和上面一样,这个问题是谷歌搜索中的最高结果,其中一个答案是正确的——它看起来真的不应该被关闭。

标签: python compression gzip


【解决方案1】:

最简单的方法是zlibencoding

compressed_value = s.encode("zlib")

然后你解压它:

plain_string_again = compressed_value.decode("zlib")

【讨论】:

  • @Daniel:是的,sstr 类型的 Python 2.x 对象。
  • 查看Standard Encodings 了解他从哪里得到的(向下滚动到“codecs”)。也可用:s.encode('rot13')s.encode( 'base64' )
  • 请注意,此方法与 gzip 命令行实用程序不兼容,因为 gzip 包含标头和校验和,而此机制只是压缩内容。
  • 我知道这是旧的,但你的解压代码行应该是:plain_string_again = compressed_value.decode("zlib")
  • @BenjaminToueg:Python 3 对 Unicode 字符串(Python 3 中的类型 str)和字节字符串(类型 bytes)之间的区别更加严格。 str 对象具有返回 bytes 对象的 encode() 方法,bytes 对象具有返回 strdecode() 方法。 zlib 编解码器的特殊之处在于它从bytes 转换为bytes,因此它不适合这种结构。您可以使用 codecs.encode(b, "zlib")codecs.decode(b, "slib") 代替 bytes 对象 b
【解决方案2】:

如果你想产生一个完整的gzip兼容的二进制字符串,带有标题等,你可以使用gzip.GzipFileStringIO

try:
    from StringIO import StringIO  # Python 2.7
except ImportError:
    from io import StringIO  # Python 3.x
import gzip
out = StringIO()
with gzip.GzipFile(fileobj=out, mode="w") as f:
  f.write("This is mike number one, isn't this a lot of fun?")
out.getvalue()

# returns '\x1f\x8b\x08\x00\xbd\xbe\xe8N\x02\xff\x0b\xc9\xc8,V\x00\xa2\xdc\xcc\xecT\x85\xbc\xd2\xdc\xa4\xd4"\x85\xfc\xbcT\x1d\xa0X\x9ez\x89B\tH:Q!\'\xbfD!?M!\xad4\xcf\x1e\x00w\xd4\xea\xf41\x00\x00\x00'

【讨论】:

  • 与此相反的是:`def gunzip_text(text): infile = StringIO.StringIO() infile.write(text) with gzip.GzipFile(fileobj=infile, mode="r") as f: f.rewind() f.read() return out.getvalue()
  • @fastmultiplication: 或更短:f = gzip.GzipFile(StringIO.StringIO(text)); result = f.read(); f.close(); return result
  • 不幸的是,问题已经接近,所以我无法做出新的答案,但here 是如何在 Python 3 中做到这一点。
  • 可能不相关,是先在内存中压缩更快(使用本地磁盘)?
  • 在 Python 3 中:import zlib; my_string = "hello world"; my_bytes = zlib.compress(my_string.encode('utf-8')); my_hex = my_bytes.hex(); my_bytes2 = bytes.fromhex(my_hex); my_string2 = zlib.decompress(my_bytes); assert my_string == my_string2;
【解决方案3】:
s = "a long string of characters"

g = gzip.open('gzipfilename.gz', 'w', 5) # ('filename', 'read/write mode', compression level)
g.write(s)
g.close()

【讨论】:

  • 我猜问题是关于压缩内存中的字符串,而不必在此过程中将其写入磁盘。否则你的答案是完全正确的。
【解决方案4】:

对于那些想要以 JSON 格式压缩 Pandas 数据帧的人:

使用 Python 3.6 和 Pandas 0.23 测试

import sys
import zlib, lzma, bz2
import math

def convert_size(size_bytes):
    if size_bytes == 0:
        return "0B"
    size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
    i = int(math.floor(math.log(size_bytes, 1024)))
    p = math.pow(1024, i)
    s = round(size_bytes / p, 2)
    return "%s %s" % (s, size_name[i])

dataframe = pd.read_csv('...') # your CSV file
dataframe_json = dataframe.to_json(orient='split')
data = dataframe_json.encode()
compressed_data = bz2.compress(data)
decompressed_data = bz2.decompress(compressed_data).decode()
dataframe_aux = pd.read_json(decompressed_data, orient='split')

#Original data size:  10982455 10.47 MB
#Encoded data size:  10982439 10.47 MB
#Compressed data size:  1276457 1.22 MB (lzma, slow), 2087131 1.99 MB (zlib, fast), 1410908 1.35 MB (bz2, fast)
#Decompressed data size:  10982455 10.47 MB
print('Original data size: ', sys.getsizeof(dataframe_json), convert_size(sys.getsizeof(dataframe_json)))
print('Encoded data size: ', sys.getsizeof(data), convert_size(sys.getsizeof(data)))
print('Compressed data size: ', sys.getsizeof(compressed_data), convert_size(sys.getsizeof(compressed_data)))
print('Decompressed data size: ', sys.getsizeof(decompressed_data), convert_size(sys.getsizeof(decompressed_data)))

print(dataframe.head())
print(dataframe_aux.head())

【讨论】:

    【解决方案5】:

    Sven Marnach 2011 年答案的 Python3 版本:

    import gzip
    exampleString = 'abcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijmortenpunnerudengelstadrocksklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuvabcdefghijklmnopqrstuv123'
    compressed_value = gzip.compress(bytes(exampleString, 'utf-8'))
    plain_string_again = gzip.decompress(compressed_value)
    

    【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2011-08-18
    • 1970-01-01
    • 2010-11-24
    • 2011-04-07
    • 1970-01-01
    • 1970-01-01
    • 2015-05-28
    相关资源
    最近更新 更多