有没有办法只用 tensorflow.estimator.train_and_evaluate() 保存最佳模型？答案

【问题标题】：Is there some way to save best model only with tensorflow.estimator.train_and_evaluate()?有没有办法只用 tensorflow.estimator.train_and_evaluate() 保存最佳模型？
【发布时间】：2019-07-27 01:48:31
【问题描述】：

我尝试使用 .config 文件从检查点重新训练 TF 对象检测 API 模型，以使用 tf.estimator.train_and_evaluate() 方法（如 models/research/object_detection/model_main.py 中的方法）训练管道。它每 N 步或每 N 秒保存一次检查点。

但我只想保存一个最好的模型，比如 Keras。有没有办法用 TF 对象检测 API 模型来做到这一点？可能是 tf.Estimator.train 的一些选项/回调，或者是在 Keras 中使用检测 API 的某种方式？

【问题讨论】：

我不得不说 Sharky 和 prouast 的解决方案都是正确的。我尝试了他们两个，它工作正常。因此，如果有人遇到与我相同的问题，您可以使用其中任何一个选项。非常感谢 Sharky 和 prouast 提供清晰有用的答案！
谢谢你 - 但我必须把这些答案中的代码放在哪里？我正在使用来自对象检测 API 的默认 model_main_tf2.py 脚本进行训练。我需要自己修改 OD 脚本吗？

标签： python tensorflow machine-learning computer-vision object-detection-api

【解决方案1】：

如果您正在使用 tensorflow/models 的模型存储库进行训练。可以修改models/research/object_detection/model_lib.py 文件create_train_and_eval_specs 函数以包含最佳导出器：

final_exporter = tf.estimator.FinalExporter(
    name=final_exporter_name, serving_input_receiver_fn=predict_input_fn)

best_exporter = tf.estimator.BestExporter(
    name="best_exporter",
    serving_input_receiver_fn=predict_input_fn,
    event_file_pattern='eval_eval/*.tfevents.*',
    exports_to_keep=5)
exporters = [final_exporter, best_exporter]

train_spec = tf.estimator.TrainSpec(
    input_fn=train_input_fn, max_steps=train_steps)

eval_specs = [
    tf.estimator.EvalSpec(
        name=eval_spec_name,
        input_fn=eval_input_fn,
        steps=eval_steps,
        exporters=exporters)
]

【讨论】：

【解决方案2】：

我一直在使用https://github.com/bluecamel/best_checkpoint_copier，这对我来说效果很好。

例子：

best_copier = BestCheckpointCopier(
   name='best', # directory within model directory to copy checkpoints to
   checkpoints_to_keep=10, # number of checkpoints to keep
   score_metric='metrics/total_loss', # metric to use to determine "best"
   compare_fn=lambda x,y: x.score < y.score, # comparison function used to determine "best" checkpoint (x is the current checkpoint; y is the previously copied checkpoint with the highest/worst score)
   sort_key_fn=lambda x: x.score,
   sort_reverse=False) # sort order when discarding excess checkpoints

将其传递给您的 eval_spec：

eval_spec = tf.estimator.EvalSpec(
   ...
   exporters=best_copier,
   ...)

【讨论】：

谢谢你 - 但我必须把那个代码放在哪里？我正在使用来自对象检测 API 的默认 model_main_tf2.py 脚本进行训练。我需要自己修改 OD 脚本吗？

【解决方案3】：

您可以尝试使用BestExporter。据我所知，这是您尝试做的唯一选择。

exporter = tf.estimator.BestExporter(
      compare_fn=_loss_smaller,
      exports_to_keep=5)

eval_spec = tf.estimator.EvalSpec(
    input_fn,
    steps,
    exporters)

https://www.tensorflow.org/api_docs/python/tf/estimator/BestExporter

【讨论】：

我看到导出器以稍微不同的格式导出文件saved_model.pb.. 并且缺少元文件（例如：model.ckpt-3073.meta）...它们是否等同于使用？我记得*.pb是导出冻结推理图后生成的。
BestExporter 也保存了 ckpt，尽管它没有复制到新的文件夹中，如果生成了新的 ckpt（比如 tf.estimator.RunConfig 中的 keep_checkpoint_max=5），它注定要被清除，其中 @ prouast 的答案很合适。