【问题标题】:Python - Write json to BytesIO from dictionary object and store as compressed tar.gzPython - 从字典对象将 json 写入 BytesIO 并存储为压缩的 tar.gz
【发布时间】:2019-12-08 18:41:26
【问题描述】:

我的代码大部分都在工作,但我在将 tar 文件写入远程文件系统时遇到了一些麻烦。下面的代码应该将大字典序列化为 json 并写入压缩文件对象。命名的临时文件是可选的,因为我也可以写入文件系统上的永久文件。 fs 是一个 gcsfs.GCSFileSystem 对象。它支持将文件复制到谷歌云存储的put方法。

def write_main(fs, remote_fp, data):
    """
    input -
        fs filesystem object
        fp filepath or path object
        data object
    output - bool
    ref: https://stackoverflow.com/questions/39109180/dumping-json-directly-into-a-tarfile
    """
    tmp_file = NamedTemporaryFile()
    filename = tmp_file.name
    with io.BytesIO() as out_stream, tarfile.open(filename, 'w|gz', out_stream) as tar_file:
        out_stream.write(json.dumps(data).encode())
        tar_file.size = out_stream.tell()
        out_stream.seek(0)
        tar_file.addfile(tar_file, out_stream)

    fs.put(filename, remote_fp)

我在尝试测试功能代码时收到以下错误:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-23-020281a8b588> in <module>
      3     tar_file.size = out_stream.tell()
      4     out_stream.seek(0)
----> 5     tar_file.addfile(tar_file, out_stream)
      6
      7 fs.put(filename, remote_fp)

~/anaconda3/lib/python3.7/tarfile.py in addfile(self, tarinfo, fileobj)
   1964         tarinfo = copy.copy(tarinfo)
   1965
-> 1966         buf = tarinfo.tobuf(self.format, self.encoding, self.errors)
   1967         self.fileobj.write(buf)
   1968         self.offset += len(buf)

AttributeError: 'TarFile' object has no attribute 'tobuf'

【问题讨论】:

    标签: json python-3.x stream tarfile bytesio


    【解决方案1】:

    我认为您为 tar_file 传递了错误的参数。它应该是一个 TarInfo 对象。这就是引发无属性“tobuf”错误的原因。 link

    def addfile(self, tarinfo, fileobj=None):
        """Add the TarInfo object `tarinfo' to the archive. If `fileobj' is
           given, tarinfo.size bytes are read from it and added to the archive.
           You can create TarInfo objects using gettarinfo().
           On Windows platforms, `fileobj' should always be opened with mode
           'rb' to avoid irritation about the file size.
        """
    

    【讨论】:

      【解决方案2】:

      @marian 你是对的,但我又犯了一个错误。将 out_stream 传递给 tarfile.open 后,由于某种原因导致写入失败。新代码如下所示:

      with io.BytesIO() as out_stream, tarfile.open(filename, 'w|gz') as tar_file:
          out_stream.write(json.dumps(data).encode())
          out_stream.seek(0)
          info = tarfile.TarInfo("data")
          info.size = len(out_stream.getbuffer())
          tar_file.addfile(info, out_stream)
      

      【讨论】:

        猜你喜欢
        • 2021-10-15
        • 1970-01-01
        • 2021-11-15
        • 1970-01-01
        • 2017-01-19
        • 1970-01-01
        • 1970-01-01
        • 2023-02-06
        • 1970-01-01
        相关资源
        最近更新 更多