【问题标题】:Tensorflow & Keras can't load .ckpt saveTensorflow 和 Keras 无法加载 .ckpt 保存
【发布时间】:2019-12-05 20:08:50
【问题描述】:

所以我使用 ModelCheckpoint 回调来保存我正在训练的模型的最佳时期。它保存时没有错误,但是当我尝试加载它时,我得到了错误:

2019-07-27 22:58:04.713951: W tensorflow/core/util/tensor_slice_reader.cc:95] Could not open C:\Users\Riley\PycharmProjects\myNN\cp.ckpt: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

我尝试过使用绝对/完整路径,但没有成功。我确定我可以使用 EarlyStopping,但我仍然想了解为什么会出现错误。这是我的代码:

from __future__ import absolute_import, division, print_function

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
import datetime
import statistics

(train_images, train_labels), (test_images, test_labels) = np.load("dataset.npy", allow_pickle=True)

train_images = train_images / 255
test_images = test_images / 255

train_labels = list(map(float, train_labels))
test_labels = list(map(float, test_labels))
train_labels = [i/10 for i in train_labels]
test_labels = [i/10 for i in test_labels]

'''
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(128, 128)),
    keras.layers.Dense(64, activation=tf.nn.relu),
    keras.layers.Dense(1)
  ])

'''

start_time = datetime.datetime.now()

model = keras.Sequential([
    keras.layers.Conv2D(32, kernel_size=(5, 5), strides=(1, 1), activation='relu', input_shape=(128, 128, 1)),
    keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2, 2)),
    keras.layers.Dropout(0.2),
    keras.layers.Conv2D(64, (5, 5), activation='relu'),
    keras.layers.MaxPooling2D(pool_size=(2, 2)),
    keras.layers.Dropout(0.2),
    keras.layers.Flatten(),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(1000, activation='relu'),
    keras.layers.Dense(1)

])

model.compile(loss='mean_absolute_error',
    optimizer=keras.optimizers.SGD(lr=0.01),
    metrics=['mean_absolute_error', 'mean_squared_error'])

train_images = train_images.reshape(328, 128, 128, 1)
test_images = test_images.reshape(82, 128, 128, 1)

model.fit(train_images, train_labels, epochs=100, callbacks=[keras.callbacks.ModelCheckpoint("cp.ckpt", monitor='mean_absolute_error', save_best_only=True, verbose=1)])

model.load_weights("cp.ckpt")

predictions = model.predict(test_images)

totalDifference = 0
for i in range(82):
    print("%s: %s" % (test_labels[i] * 10, predictions[i] * 10))
    totalDifference += abs(test_labels[i] - predictions[i])

avgDifference = totalDifference / 8.2

print("\n%s\n" % avgDifference)
print("Time Elapsed:")
print(datetime.datetime.now() - start_time)

【问题讨论】:

    标签: python tensorflow machine-learning keras computer-vision


    【解决方案1】:
    import tensorflow as tf
    
    # Create some variables.
    v1 = tf.Variable(tf.random_normal([784, 200], stddev=0.35), name="v1")
    v2 = tf.Variable(tf.random_normal([784, 200], stddev=0.35), name="v2")
    
    # Add an op to initialize the variables.
    init_op = tf.global_variables_initializer()
    
    # Add ops to save and restore all the variables.
    saver = tf.train.Saver()
    
    # Later, launch the model, initialize the variables, do some work, save the
    # variables to disk.
    with tf.Session() as sess:
      sess.run(init_op)
      # Do some work with the model.
    
      # Save the variables to disk.
      save_path = saver.save(sess, "/tmp/model.ckpt")
      print("Model saved in file: %s" % save_path)
    
    # Later, launch the model, use the saver to restore variables from disk, and
    # do some work with the model.
    with tf.Session() as sess:
      # Restore variables from disk.
      saver.restore(sess, "/tmp/model.ckpt")
      print("Model restored.")
      # Do some work with the model
    

    Source

    【讨论】:

    • 谢谢!但是我真的需要改变我的整个代码来解决这个问题吗?除非我遗漏了什么,否则文档说我的应该可以工作。
    • 保留此代码并将您的模型写入# Do some work with the model.
    【解决方案2】:

    TLDR;您正在保存整个模型,同时尝试仅加载权重,这不是它的工作原理。

    说明

    你的模特fit

    model.fit(
        train_images,
        train_labels,
        epochs=100,
        callbacks=[
            keras.callbacks.ModelCheckpoint(
                "cp.ckpt", monitor="mean_absolute_error", save_best_only=True, verbose=1
            )
        ],
    )
    

    作为save_weights=False 默认在ModelCheckpoint 中,您将整个模型保存到.ckpt

    顺便说一句。文件应命名为.hdf5.hf5,因为它是Hierarchical Data Format 5。由于 Windows 与扩展无关,如果 tensorflow / keras 依赖此操作系统上的扩展,您可能会遇到一些问题。

    另一方面,您只加载模型的权重,而文件包含整个模型

    model.load_weights("cp.ckpt")
    

    Tensorflow 的检查点 (.cp) 机制与 Keras 的 (.hdf5) 不同,因此请注意这一点(有计划将它们更紧密地集成,请参阅 herehere)。

    解决方案

    所以,要么像现在一样使用回调,但是使用model.load("model.hdf5") 或将save_weights_only=True 参数添加到ModelCheckpoint

    model.fit(
        train_images,
        train_labels,
        epochs=100,
        callbacks=[
            keras.callbacks.ModelCheckpoint(
                "weights.hdf5",
                monitor="mean_absolute_error",
                save_best_only=True,
                verbose=1,
                save_weights_only=True,  # Specify this
            )
        ],
    )
    

    您可以使用您的model.load_weights("weights.hdf5")

    【讨论】:

      【解决方案3】:

      model.load_weights 在这里不起作用。原因在上面的答案中提到。 您可以通过此代码加载权重。首先加载模型,然后加载权重。希望这段代码能帮到你

      import tensorflow as tf
      
      model=dense_net()
      ckpt = tf.train.Checkpoint(
      step=tf.Variable(1, dtype=tf.int64),  net=model)
      ckpt.restore(tf.train.latest_checkpoint("/kaggle/working/training_1/cp.ckpt.data-00001-of-00002"))
      

      【讨论】:

        猜你喜欢
        • 2019-04-12
        • 2021-12-23
        • 2020-12-24
        • 1970-01-01
        • 1970-01-01
        • 2021-03-08
        • 1970-01-01
        • 1970-01-01
        • 2021-12-14
        相关资源
        最近更新 更多