如何在 Tensorflow 对象检测 api 中评估预训练模型答案

【问题标题】：How to evaluate a pretrained model in Tensorflow object detection api如何在 Tensorflow 对象检测 api 中评估预训练模型
【发布时间】：2017-11-26 04:47:28
【问题描述】：

尝试使用最近发布的 Tensorflow 对象检测 API，并且想知道如何评估他们在模型库中提供的预训练模型之一？前任。如何获得该预训练模型的 mAP 值？

由于他们提供的script 似乎使用检查点（根据他们的documentation），我尝试制作一个检查点的愚蠢副本，该副本指向他们模型动物园中提供的model.ckpt.data-00000-of-00001 模型，但是eval.py 不喜欢那样。

checkpoint
   model_checkpoint_path: "model.ckpt.data-00000-of-00001"

我曾考虑过对预训练的训练进行简单的训练，然后对其进行评估......但我不确定这是否会给我正确的指标。

对不起，如果这是一个基本问题 - 我刚刚开始使用 Tensorflow，并想验证我得到了正确的东西。不胜感激！

编辑：

我按照乔纳森的回答做了一个检查点文件：

model_checkpoint_path: "model.ckpt"
all_model_checkpoint_paths: "model.ckpt"

评估脚本采用并使用 COCO 数据集进行评估。然而评估停止并说存在形状不匹配：

...
[[Node: save/Assign_19 = Assign[T=DT_FLOAT, _class=["loc:@BoxPredictor_4/ClassPredictor/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](BoxPredictor_4/ClassPredictor/weights, save/RestoreV2_19/_15)]]
2017-07-05 18:40:11.969641: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [1,1,256,486] rhs shape= [1,1,256,546]
[[Node: save/Assign_19 = Assign[T=DT_FLOAT, _class=["loc:@BoxPredictor_4/ClassPredictor/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](BoxPredictor_4/ClassPredictor/weights, save/RestoreV2_19/_15)]]
2017-07-05 18:40:11.969725: W tensorflow/core/framework/op_kernel.cc:1158] 
...
Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [1,1,256,486] rhs shape= [1,1,256,546]
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1,1,256,486] rhs shape= [1,1,256,546]

可能是什么原因导致了这种形状不匹配？我该如何解决？

【问题讨论】：

标签： tensorflow deep-learning object-detection

【解决方案1】：

您可以通过运行 eval.py 脚本来评估预训练模型。它会要求您指向一个配置文件（将在samples/configs 目录中）和一个检查点，为此您将提供.../.../model.ckpt 形式的路径（删除任何扩展名，如.meta，或.data-00000-of-00001)。

您还必须在包含您要评估的检查点的目录中创建一个名为“检查点”的文件。然后在该文件中写入以下两行：

model_checkpoint_path：“路径/到/model.ckpt”
all_model_checkpoint_paths：“路径/到/model.ckpt”

（您适当修改路径/到/的地方）

最后得到的数字是平均精度，使用 50% IOU 作为真阳性的截止阈值。这与模型 zoo 中报告的指标略有不同，后者使用 COCO mAP 指标并对多个 IOU 值取平均值。

【讨论】：

感谢乔纳森的回复！我尝试运行python eval.py --logtostderr --checkpoint_dir=path/to/model.ckpt eval_dir=path/to/eval --pipeline_config_path=path/to/.config，但这不起作用；澄清一下，我到底在哪里指出指向哪里？（目前也使用 .config 文件来指向 ckpt 文件）也只是为了确定：我最后得到的单个 mAP 值吗？
最后你会得到一个单一的mAP值，是的。关于配置文件，请查看此目录：github.com/tensorflow/models/tree/master/object_detection/… --- 您必须指向该目录中与您要评估的检查点匹配的文件。
抱歉，这与评估模型有关吗？我希望重现模型动物园的结果。我最终将 COCO 数据集转换为 TFRecord 并对其进行了几次迭代的训练/评估以获得 mAP ......尽管有关 mAP 值差异的信息很有帮助！
嗨乔恩 - 我错过了你对检查点文件的编辑。尝试过并遇到了另一个问题，我将其编辑到我的问题中。谢谢！
当你说“运行 eval.py 脚本”时——我不知道你指的是什么脚本。你能指定我在哪里可以找到它吗？我找不到它。谢谢！

【解决方案2】：

试试：

python eval.py --logtostderr --checkpoint_dir=training --eval_dir=path/to/eval_dir --pipeline_config_path=path/to/pretrained_model.config

例如：

python eval.py --logtostderr --checkpoint_dir=training --eval_dir=images/val \
  --pipelineline_config_path=training/faster_rcnn_inception_v2.config

注意：

培训目录包含您的所有培训检查点。在训练期间，Tensorflow 在此目录中生成一个检查点文件，其中包含所有检查点元数据，因此您无需创建另一个。如果您希望在生成推理图后评估经过训练的自定义模型，请确保在用于训练的 .config 中将原始 pretrained_model/model.chpt 更改为 new_trained_model/model.ckpt。你应该得到类似的输出：

Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.457
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.729
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.502
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.122
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.297
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.659
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.398
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.559
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.590
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.236
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.486
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.746
INFO:tensorflow:Writing metrics to tf summary.
INFO:tensorflow:DetectionBoxes_Precision/mAP: 0.456758
INFO:tensorflow:DetectionBoxes_Precision/mAP (large): 0.659280
INFO:tensorflow:DetectionBoxes_Precision/mAP (medium): 0.296693
INFO:tensorflow:DetectionBoxes_Precision/mAP (small): 0.122108
INFO:tensorflow:DetectionBoxes_Precision/mAP@.50IOU: 0.728587
INFO:tensorflow:DetectionBoxes_Precision/mAP@.75IOU: 0.502194
INFO:tensorflow:DetectionBoxes_Recall/AR@1: 0.397509
INFO:tensorflow:DetectionBoxes_Recall/AR@10: 0.558966
INFO:tensorflow:DetectionBoxes_Recall/AR@100: 0.590182
INFO:tensorflow:DetectionBoxes_Recall/AR@100 (large): 0.745691
INFO:tensorflow:DetectionBoxes_Recall/AR@100 (medium): 0.485964
INFO:tensorflow:DetectionBoxes_Recall/AR@100 (small): 0.236275
INFO:tensorflow:Losses/Loss/BoxClassifierLoss/classification_loss: 0.234645
INFO:tensorflow:Losses/Loss/BoxClassifierLoss/localization_loss: 0.139109
INFO:tensorflow:Losses/Loss/RPNLoss/localization_loss: 0.603733
INFO:tensorflow:Losses/Loss/RPNLoss/objectness_loss: 0.206419

【讨论】：

【解决方案3】：

您还可以使用 model_main.py 来评估您的模型。

如果您想根据验证数据评估您的模型，您应该使用：

python models/research/object_detection/model_main.py --pipeline_config_path=/path/to/pipeline_file --model_dir=/path/to/output_results --checkpoint_dir=/path/to/directory_holding_checkpoint --run_once=True

如果你想在训练数据上评估你的模型，你应该将'eval_training_data'设置为True，即：

python models/research/object_detection/model_main.py --pipeline_config_path=/path/to/pipeline_file --model_dir=/path/to/output_results --eval_training_data=True --checkpoint_dir=/path/to/directory_holding_checkpoint --run_once=True

我还添加了 cmets 来阐明之前的一些选项：

--pipeline_config_path: 用于训练检测模型的“pipeline.config”文件的路径。此文件应包含您要评估的 TFRecords 文件（训练和测试文件）的路径，即：

    ...
    train_input_reader: {
        tf_record_input_reader {
                #path to the training TFRecord
                input_path: "/path/to/train.record"
        }
        #path to the label map 
        label_map_path: "/path/to/label_map.pbtxt"
    }
    ...
    eval_input_reader: {
        tf_record_input_reader {
            #path to the testing TFRecord
            input_path: "/path/to/test.record"
        }
        #path to the label map 
        label_map_path: "/path/to/label_map.pbtxt"
    }
    ...

--model_dir：将写入结果指标的输出目录，特别是 tensorboard 可以读取的“events.*”文件。

--checkpoint_dir：保存检查点的目录。这是在训练过程中或在使用“export_inference_graph.py”导出后写入检查点文件（“model.ckpt.*”）的模型目录。在您的情况下，您应该指向从 https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md 下载的预训练模型文件夹。

--run_once：正确只运行一轮评估。

【讨论】：

这种方法还是有问题的。github.com/tensorflow/models/pull/5450
但是如何评估一个预训练模型。如果我只有 tflite 模型文件，我可以评估吗？