【问题标题】:gzipped jsonlines file read and write in pythongzip压缩的jsonlines文件在python中读写
【发布时间】:2021-08-07 11:29:10
【问题描述】:

这段代码读取和写入一个 jsonlines 文件。如何压缩它?我尝试直接使用gzip.open,但出现各种错误。

import json
    
def dump_jsonl(data, output_path, append=False):
    """
    Write list of objects to a JSON lines file.
    """
    mode = 'a+' if append else 'w'
    with open(output_path, mode, encoding='utf-8') as f:
        for line in data:
            json_record = json.dumps(line, ensure_ascii=False)
            f.write(json_record + '\n')
    print('Wrote {} records to {}'.format(len(data), output_path))

def load_jsonl(input_path) -> list:
    """
    Read list of objects from a JSON lines file.
    """
    data = []
    with open(input_path, 'r', encoding='utf-8') as f:
        for line in f:
            data.append(json.loads(line.rstrip('\n|\r')))
    print('Loaded {} records from {}'.format(len(data), input_path))
    return data

这是我正在做的压缩,但我无法阅读它。

def dump_jsonl(data, output_path, append=False):
    with gzip.open(output_path, "a+") as f:
        for line in data:
            json_record = json.dumps(line, ensure_ascii = False)
            encoded = json_record.encode("utf-8") + ("\n").encode("utf-8")
            compressed = gzip.compress(encoded)
            f.write(compressed)

【问题讨论】:

标签: python jsonlines


【解决方案1】:

使用gzip module的压缩功能。

import gzip
with open('file.jsonl') as f_in:
    with gzip.open('file.jsonl.gz', 'wb') as f_out:
        f_out.writelines(f_in)

gzip.open() 用于打开 gzip 文件,而不是 jsonl。

阅读:

gzip a file in Python

Python support for Gzip

【讨论】:

  • 仅供参考:我强烈建议使用 with 语句打开文件。然后,您无需考虑再次关闭文件。反正我是 1。
猜你喜欢
  • 2017-01-19
  • 2010-10-12
  • 1970-01-01
  • 2013-01-16
  • 1970-01-01
  • 2014-07-04
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多