【问题标题】:Running into input/output error when saving tensorflow session checkpoint (on google colab)保存 tensorflow 会话检查点时遇到输入/输出错误(在 google colab 上)
【发布时间】:2019-08-19 17:56:14
【问题描述】:

首先,这是错误:

UnknownError (see above for traceback): drive/Colab Notebooks/saved_model/model1.ckpt-313.data-00000-of-00001.tempstate7585627669875003657; Input/output error
     [[node save_2/SaveV2 (defined at <ipython-input-48-1ef727974922>:66) ]]
     [[node save_2/SaveV2 (defined at <ipython-input-48-1ef727974922>:66) ]]

我认为这可能是由于磁盘空间问题,但我有 332.38 GB 可用空间,所以不可能。

我不确定还有哪些其他部分会导致此问题,但如果需要任何其他详细信息,请询问,我会更新我的帖子。

[完全错误]

    File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-48-1ef727974922>", line 66, in <module>
    saver = tf.train.Saver(tf.global_variables())
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 832, in __init__
    self.build()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 844, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 881, in _build
    build_save=build_save, build_restore=build_restore)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 510, in _build_internal
    save_tensor = self._AddSaveOps(filename_tensor, saveables)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 210, in _AddSaveOps
    save = self.save_op(filename_tensor, saveables)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py", line 124, in save_op
    tensors)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 1807, in save_v2
    name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

【问题讨论】:

  • 您可以从您在此处发布的错误上方进行追溯吗?
  • 尽我所能!
  • Runtime type 中的硬件加速器是什么?基于标签 google-colab 我假设你在 google colab 上运行它。
  • 你是对的。它的 GPU,Python 3
  • 在使用 ModelCheckpoint 回调时,最新(稳定)版本的 TF 2.0 也会出现此错误

标签: python tensorflow google-colaboratory


【解决方案1】:

这很可能是 here 提到的 google drive api 调用配额问题

我遇到了这个问题,而且我无法保存模型 我通过将模型保存在目录文件夹中,然后创建一个 zip 文件,然后下载并手动上传来解决它

import shutil

def zip_model(keras_model, filename):
  modelpath = f'/content/{filename}'
  keras_model.save(modelpath)
  shutil.make_archive(modelpath, 'zip', modelpath)

# usage:
# zipmodel(keras_model, 'model')

【讨论】:

    猜你喜欢
    • 2019-11-15
    • 2021-02-18
    • 2018-10-02
    • 2019-07-25
    • 1970-01-01
    • 1970-01-01
    • 2021-08-21
    • 1970-01-01
    • 2021-06-01
    相关资源
    最近更新 更多