【问题标题】:Restoring TensorFlow model恢复 TensorFlow 模型
【发布时间】:2016-05-01 03:58:37
【问题描述】:

我正在尝试恢复 TensorFlow 模型。我按照这个例子: http://nasdag.github.io/blog/2016/01/19/classifying-bees-with-google-tensorflow/

在示例代码的末尾,我添加了以下几行:

saver = tf.train.Saver()
save_path = saver.save(sess, "model.ckpt")
print("Model saved in file: %s" % save_path)

创建了两个文件:checkpoint 和 model.ckpt。

在一个新的 python 文件(tomas_bees_predict.py)中,我有这个代码:

import tensorflow as tf

saver = tf.train.Saver()

with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "model.ckpt")
  print("Model restored.")

但是当我执行代码时,我得到了这个错误:

Traceback (most recent call last):
  File "tomas_bees_predict.py", line 3, in <module>
    saver = tf.train.Saver()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 705, in __init__
raise ValueError("No variables to save")

ValueError:没有要保存的变量

有没有办法读取 mode.ckpt 文件并查看保存了哪些变量? 或者也许有人可以根据上述示例帮助保存模型并恢复它?

编辑 1:

我想我尝试运行相同的代码来重新创建模型结构,但我遇到了错误。我认为这可能与此处描述的代码未使用命名变量这一事实有关: http://nasdag.github.io/blog/2016/01/19/classifying-bees-with-google-tensorflow/

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

所以我做了这个实验。我写了两个版本的代码(有和没有命名变量)来保存模型和代码来恢复模型。

tensor_save_named_vars.py

import tensorflow as tf

# Create some variables.
v1 = tf.Variable(1, name="v1")
v2 = tf.Variable(2, name="v2")

# Add an op to initialize the variables.
init_op = tf.initialize_all_variables()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
  sess.run(init_op)
  print "v1 = ", v1.eval()
  print "v2 = ", v2.eval()
  # Save the variables to disk.
  save_path = saver.save(sess, "/tmp/model.ckpt")
  print "Model saved in file: ", save_path

tensor_save_not_named_vars.py:

import tensorflow as tf

# Create some variables.
v1 = tf.Variable(1)
v2 = tf.Variable(2)

# Add an op to initialize the variables.
init_op = tf.initialize_all_variables()

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, initialize the variables, do some work, save the
# variables to disk.
with tf.Session() as sess:
  sess.run(init_op)
  print "v1 = ", v1.eval()
  print "v2 = ", v2.eval()
  # Save the variables to disk.
  save_path = saver.save(sess, "/tmp/model.ckpt")
  print "Model saved in file: ", save_path

tensor_restore.py:

import tensorflow as tf

# Create some variables.
v1 = tf.Variable(0, name="v1")
v2 = tf.Variable(0, name="v2")

# Add ops to save and restore all the variables.
saver = tf.train.Saver()

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  saver.restore(sess, "/tmp/model.ckpt")
  print "Model restored."
  print "v1 = ", v1.eval()
  print "v2 = ", v2.eval()

这是我执行此代码时得到的结果:

$ python tensor_save_named_vars.py 

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
v1 =  1
v2 =  2
Model saved in file:  /tmp/model.ckpt

$ python tensor_restore.py 

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
Model restored.
v1 =  1
v2 =  2

$ python tensor_save_not_named_vars.py 

I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
v1 =  1
v2 =  2
Model saved in file:  /tmp/model.ckpt

$ python tensor_restore.py 
I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
W tensorflow/core/common_runtime/executor.cc:1076] 0x7ff953881e40 Compute status: Not found: Tensor name "v2" not found in checkpoint files /tmp/model.ckpt
     [[Node: save/restore_slice_1 = RestoreSlice[dt=DT_INT32, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]]
W tensorflow/core/common_runtime/executor.cc:1076] 0x7ff953881e40 Compute status: Not found: Tensor name "v1" not found in checkpoint files /tmp/model.ckpt
     [[Node: save/restore_slice = RestoreSlice[dt=DT_INT32, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice/tensor_name, save/restore_slice/shape_and_slice)]]
Traceback (most recent call last):
  File "tensor_restore.py", line 14, in <module>
    saver.restore(sess, "/tmp/model.ckpt")
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 891, in restore
    sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 368, in run
    results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 444, in _do_run
    e.code)
tensorflow.python.framework.errors.NotFoundError: Tensor name "v2" not found in checkpoint files /tmp/model.ckpt
     [[Node: save/restore_slice_1 = RestoreSlice[dt=DT_INT32, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]]
Caused by op u'save/restore_slice_1', defined at:
  File "tensor_restore.py", line 8, in <module>
    saver = tf.train.Saver()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 713, in __init__
    restore_sequentially=restore_sequentially)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 432, in build
    filename_tensor, vars_to_save, restore_sequentially, reshape)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 191, in _AddRestoreOps
    values = self.restore_op(filename_tensor, vs, preferred_shard)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 106, in restore_op
    preferred_shard=preferred_shard)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/io_ops.py", line 189, in _restore_slice
    preferred_shard, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 271, in _restore_slice
    preferred_shard=preferred_shard, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 664, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1834, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1043, in __init__
    self._traceback = _extract_stack()

所以也许原来的代码(见上面的外部链接)可以修改成这样的:

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  weight_var = tf.Variable(initial, name="weight_var")
  return weight_var

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  bias_var = tf.Variable(initial, name="bias_var")
  return bias_var

但是我有一个问题:恢复 weight_var 和 bias_var 变量是否足以实现预测?我在有 GPU 的强大机器上进行了训练,我想将模型复制到没有 GPU 的功能较弱的计算机上运行预测。

【问题讨论】:

标签: python tensorflow


【解决方案1】:

如果出现此类问题,请尝试重新启动内核,因为当前变量会覆盖先前导致它们之间发生冲突的变量,因此会显示 notFoundError 和其他问题。

我遇到了相同类型的问题,重新启动内核对我有用。 (注意:尽量避免多次运行内核,因为它可能会破坏您的模型文件,重新创建覆盖现有变量的变量,从而最终改变原始值。)

【讨论】:

    【解决方案2】:

    这个问题应该是由双重创建同一个网络时的名称范围变体引起的。

    输入命令:

    tf.reset_default_graph()

    在创建网络之前

    【讨论】:

      【解决方案3】:

      确保 tf.train.Saver() 的声明与 tf.Session() 为 sess

      【讨论】:

        【解决方案4】:

        我想我尝试运行相同的代码来重新创建模型结构,但我遇到了错误。我认为这可能与此处描述的代码未使用命名变量这一事实有关: http://nasdag.github.io/blog/2016/01/19/classifying-bees-with-google-tensorflow/

        def weight_variable(shape):
          initial = tf.truncated_normal(shape, stddev=0.1)
          return tf.Variable(initial)
        
        def bias_variable(shape):
          initial = tf.constant(0.1, shape=shape)
          return tf.Variable(initial)
        

        所以我做了这个实验。我写了两个版本的代码(有和没有命名变量)来保存模型和代码来恢复模型。

        tensor_save_named_vars.py

        import tensorflow as tf
        
        # Create some variables.
        v1 = tf.Variable(1, name="v1")
        v2 = tf.Variable(2, name="v2")
        
        # Add an op to initialize the variables.
        init_op = tf.initialize_all_variables()
        
        # Add ops to save and restore all the variables.
        saver = tf.train.Saver()
        
        # Later, launch the model, initialize the variables, do some work, save the
        # variables to disk.
        with tf.Session() as sess:
          sess.run(init_op)
          print "v1 = ", v1.eval()
          print "v2 = ", v2.eval()
          # Save the variables to disk.
          save_path = saver.save(sess, "/tmp/model.ckpt")
          print "Model saved in file: ", save_path
        

        tensor_save_not_named_vars.py:

        import tensorflow as tf
        
        # Create some variables.
        v1 = tf.Variable(1)
        v2 = tf.Variable(2)
        
        # Add an op to initialize the variables.
        init_op = tf.initialize_all_variables()
        
        # Add ops to save and restore all the variables.
        saver = tf.train.Saver()
        
        # Later, launch the model, initialize the variables, do some work, save the
        # variables to disk.
        with tf.Session() as sess:
          sess.run(init_op)
          print "v1 = ", v1.eval()
          print "v2 = ", v2.eval()
          # Save the variables to disk.
          save_path = saver.save(sess, "/tmp/model.ckpt")
          print "Model saved in file: ", save_path
        

        tensor_restore.py:

        import tensorflow as tf
        
        # Create some variables.
        v1 = tf.Variable(0, name="v1")
        v2 = tf.Variable(0, name="v2")
        
        # Add ops to save and restore all the variables.
        saver = tf.train.Saver()
        
        # Later, launch the model, use the saver to restore variables from disk, and
        # do some work with the model.
        with tf.Session() as sess:
          # Restore variables from disk.
          saver.restore(sess, "/tmp/model.ckpt")
          print "Model restored."
          print "v1 = ", v1.eval()
          print "v2 = ", v2.eval()
        

        这是我执行此代码时得到的结果:

        $ python tensor_save_named_vars.py 
        
        I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
        I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
        v1 =  1
        v2 =  2
        Model saved in file:  /tmp/model.ckpt
        
        $ python tensor_restore.py 
        
        I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
        I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
        Model restored.
        v1 =  1
        v2 =  2
        
        $ python tensor_save_not_named_vars.py 
        
        I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
        I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
        v1 =  1
        v2 =  2
        Model saved in file:  /tmp/model.ckpt
        
        $ python tensor_restore.py 
        I tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 4
        I tensorflow/core/common_runtime/direct_session.cc:58] Direct session inter op parallelism threads: 4
        W tensorflow/core/common_runtime/executor.cc:1076] 0x7ff953881e40 Compute status: Not found: Tensor name "v2" not found in checkpoint files /tmp/model.ckpt
             [[Node: save/restore_slice_1 = RestoreSlice[dt=DT_INT32, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]]
        W tensorflow/core/common_runtime/executor.cc:1076] 0x7ff953881e40 Compute status: Not found: Tensor name "v1" not found in checkpoint files /tmp/model.ckpt
             [[Node: save/restore_slice = RestoreSlice[dt=DT_INT32, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice/tensor_name, save/restore_slice/shape_and_slice)]]
        Traceback (most recent call last):
          File "tensor_restore.py", line 14, in <module>
            saver.restore(sess, "/tmp/model.ckpt")
          File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 891, in restore
            sess.run([self._restore_op_name], {self._filename_tensor_name: save_path})
          File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 368, in run
            results = self._do_run(target_list, unique_fetch_targets, feed_dict_string)
          File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 444, in _do_run
            e.code)
        tensorflow.python.framework.errors.NotFoundError: Tensor name "v2" not found in checkpoint files /tmp/model.ckpt
             [[Node: save/restore_slice_1 = RestoreSlice[dt=DT_INT32, preferred_shard=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save/Const_0, save/restore_slice_1/tensor_name, save/restore_slice_1/shape_and_slice)]]
        Caused by op u'save/restore_slice_1', defined at:
          File "tensor_restore.py", line 8, in <module>
            saver = tf.train.Saver()
          File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 713, in __init__
            restore_sequentially=restore_sequentially)
          File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 432, in build
            filename_tensor, vars_to_save, restore_sequentially, reshape)
          File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 191, in _AddRestoreOps
            values = self.restore_op(filename_tensor, vs, preferred_shard)
          File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 106, in restore_op
            preferred_shard=preferred_shard)
          File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/io_ops.py", line 189, in _restore_slice
            preferred_shard, name=name)
          File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 271, in _restore_slice
            preferred_shard=preferred_shard, name=name)
          File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 664, in apply_op
            op_def=op_def)
          File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1834, in create_op
            original_op=self._default_original_op, op_def=op_def)
          File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1043, in __init__
            self._traceback = _extract_stack()
        

        所以也许原来的代码(见上面的外部链接)可以修改成这样的:

        def weight_variable(shape):
          initial = tf.truncated_normal(shape, stddev=0.1)
          weight_var = tf.Variable(initial, name="weight_var")
          return weight_var
        
        def bias_variable(shape):
          initial = tf.constant(0.1, shape=shape)
          bias_var = tf.Variable(initial, name="bias_var")
          return bias_var
        

        但是我有一个问题:恢复 weight_var 和 bias_var 变量是否足以实现预测?我在有 GPU 的强大机器上进行了训练,我想将模型复制到没有 GPU 的功能较弱的计算机上运行预测。

        【讨论】:

          【解决方案5】:

          这里有一个类似的问题:Tensorflow: how to save/restore a model? TLDR;在使用 Saver 对象恢复权重之前,您需要使用相同的 TensorFlow API 命令序列重新创建模型结构

          这不是最理想的,请关注Github issue #696 以了解使这更容易的进展

          【讨论】:

          • 备案:问题已经一年多前关闭了,现在好像支持存储模型结构了。
          • “在使用 Saver 对象恢复权重之前,您需要使用相同序列的 TensorFlow API 命令重新创建模型结构”您能进一步解释一下吗?
          • @Chaine 有更好的选择,使用 MetaGraph 或保存的模型
          • @YaroslavBulatov 我很难将我的代码移植到 android。我已经有一个现有的 Android 项目和一个现有的训练模型。我单独获得了用于训练的代码,并添加了一些代码行来生成将在 Android 中使用的冻结模型。我发布了一个关于这个的问题。想要的可以回复stackoverflow.com/questions/44208691/… :)
          猜你喜欢
          • 2018-09-06
          • 2017-08-21
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2018-07-02
          • 1970-01-01
          • 1970-01-01
          • 2017-10-22
          相关资源
          最近更新 更多