Tensorflow2 对象检测计数 API 教程答案

【问题标题】：Tensorflow2 Object Detection Counting API for tutorialTensorflow2 对象检测计数 API 教程
【发布时间】：2021-03-16 21:05:14
【问题描述】：

我在使用网络摄像头教程自定义 TensorFlow 对象检测方面绞尽脑汁，以计算从每个分类中检测到的对象数量。我使用efficientdet_d0_coco17_tpu-32 模型训练了我的自定义检测模型。我也在使用“detect_from_webcam.py”教程脚本。我能够让检测工作并在屏幕上显示分类。现在我想显示每个分类检测到的数量。

我查看并尝试了 TensorFlow 对象计数 API，但似乎无法理解如何将它与我的自定义训练模型集成。 Counting_API

如果这是一个愚蠢的问题，请原谅我，因为我开始使用 Python 编码和机器学习。提前感谢您的帮助！

我正在使用 Tensorflow 2.4.1 和 Python 3.7.0

谁能帮助我或指出我需要添加什么来计算检测到的对象？

这是我使用 CMD 传递给脚本的命令：

python detect_from_webcam.py -m research\object_detection\inference_graph\saved_model -l research\object_detection\Training\labelmap.pbtxt

这是脚本：

import numpy as np
import argparse
import tensorflow as tf
import cv2
import pathlib

from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
from api import object_counting_api
from utils import backbone
# patch tf1 into `utils.ops`
utils_ops.tf = tf.compat.v1

# Patch the location of gfile
tf.gfile = tf.io.gfile


def load_model(model_path):
    model = tf.saved_model.load(model_path)
    return model


def run_inference_for_single_image(model, image):
    image = np.asarray(image)
    # The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
    input_tensor = tf.convert_to_tensor(image)
    # The model expects a batch of images, so add an axis with `tf.newaxis`.
    input_tensor = input_tensor[tf.newaxis,...]
    
    # Run inference
    output_dict = model(input_tensor)

    # All outputs are batches tensors.
    # Convert to numpy arrays, and take index [0] to remove the batch dimension.
    # We're only interested in the first num_detections.
    num_detections = int(output_dict.pop('num_detections'))
    output_dict = {key: value[0, :num_detections].numpy()
                   for key, value in output_dict.items()}
    output_dict['num_detections'] = num_detections
    #print(num_detections)
    # detection_classes should be ints.
    output_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)
    
    # Handle models with masks:
    if 'detection_masks' in output_dict:
        # Reframe the the bbox mask to the image size.
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
                                    output_dict['detection_masks'], output_dict['detection_boxes'],
                                    image.shape[0], image.shape[1])      
        detection_masks_reframed = tf.cast(detection_masks_reframed > 0.5, tf.uint8)
        output_dict['detection_masks_reframed'] = detection_masks_reframed.numpy()
    
    return output_dict


def run_inference(model, category_index, cap):
    
    while True:
        ret, image_np = cap.read()
        
        # Actual detection.
        output_dict = run_inference_for_single_image(model, image_np)
        # Visualization of the results of a detection.
        vis_util.visualize_boxes_and_labels_on_image_array(
            image_np,
            output_dict['detection_boxes'],
            output_dict['detection_classes'],
            output_dict['detection_scores'],
            category_index,
            instance_masks=output_dict.get('detection_masks_reframed', None),
            use_normalized_coordinates=True,
            line_thickness=8)
           
        cv2.imshow('object_detection', cv2.resize(image_np, (1920, 1080)))
        if cv2.waitKey(25) & 0xFF == ord('q'):
            cap.release()
            cv2.destroyAllWindows()
            break


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Detect objects inside webcam videostream')
    parser.add_argument('-m', '--model', type=str, required=True, help='Model Path')
    parser.add_argument('-l', '--labelmap', type=str, required=True, help='Path to Labelmap')
    args = parser.parse_args()

    detection_model = load_model(args.model)
    category_index = label_map_util.create_category_index_from_labelmap(args.labelmap, use_display_name=True)
    
    cap = cv2.VideoCapture(0)
    run_inference(detection_model, category_index, cap)

【问题讨论】：

我想你忘了问一个问题。
Len(output_dict) 将为您提供框数，即检测到的对象数。
您只需要该值还是将其显示在视频上？
@pratap 在 tensorflow v2 中，你需要一些更详细的步骤来获得答案

标签： python tensorflow tensorflow2.0 object-detection

【解决方案1】：

您可以使用single_image_object_counting.py of tensorflow object counting api 计算图像中的对象。您只需将 ssd_mobilenet_v1_coco_2018_01_28 替换为您自己的包含推理图的模型。

您可以参考如下代码

input_video = "image.jpg"
detection_graph, category_index = backbone.set_model(MODEL_DIR)

is_color_recognition_enabled = False # set it to true for enabling the color prediction for the detected objects

# targeted objects counting
result = object_counting_api.single_image_object_counting(input_video, detection_graph, category_index, is_color_recognition_enabled) 

print (result)

更多详情可以参考here。

【讨论】：

【解决方案2】：

注意：此答案不要将检测计数写在图像或视频上，只需将检测计数计算为单个值。

经过大量 python 代码审查，我只获得了给定类的检测计数：

threshold=0.5
labels="dog"
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)
detection_count = 0
output_dict = run_inference_for_single_image(model, image_np)

  for i, (y_min, x_min, y_max, x_max) in enumerate(output_dict['detection_boxes']):
    # validates if score has a acceptable value and if its class match with expected class
    if output_dict['detection_scores'][i] > threshold and (labels == None or category_index[output_dict['detection_classes'][i]]['name'] in labels):
      detection_count += 1

检测计数值可供使用后，您可以将其添加到图像或视频中。

我会在准备好后分享整个代码。基于此：

https://colab.research.google.com/github/tensorflow/models/blob/master/research/object_detection/colab_tutorials/object_detection_tutorial.ipynb

【讨论】：