【问题标题】:Get the bounding box coordinates in the TensorFlow object detection API tutorialTensorFlow对象检测API教程中获取边界框坐标
【发布时间】:2018-08-01 13:42:03
【问题描述】:

我是 Python 和 Tensorflow 的新手。我正在尝试从Tensorflow Object Detection API 运行对象检测教程文件, 但是当检测到对象时,我找不到在哪里可以获得边界框的坐标。

相关代码:

 # The following processing is only for single image
 detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
 detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])

我假设绘制边界框的地方是这样的:

 # Visualization of the results of detection.
 vis_util.visualize_boxes_and_labels_on_image_array(
      image_np,
      output_dict['detection_boxes'],
      output_dict['detection_classes'],
      output_dict['detection_scores'],
      category_index,
      instance_masks=output_dict.get('detection_masks'),
      use_normalized_coordinates=True,
      line_thickness=8)
 plt.figure(figsize=IMAGE_SIZE)
 plt.imshow(image_np)

我尝试打印output_dict['detection_boxes'],但我不确定这些数字的含义。有很多。

array([[ 0.56213236,  0.2780568 ,  0.91445708,  0.69120586],
       [ 0.56261235,  0.86368728,  0.59286624,  0.8893863 ],
       [ 0.57073039,  0.87096912,  0.61292225,  0.90354401],
       [ 0.51422435,  0.78449738,  0.53994244,  0.79437423],
......

       [ 0.32784131,  0.5461576 ,  0.36972913,  0.56903434],
       [ 0.03005961,  0.02714229,  0.47211722,  0.44683522],
       [ 0.43143299, 0.09211366,  0.58121657,  0.3509962 ]], dtype=float32)

我找到了类似问题的答案,但我没有像他们那样有一个名为 box 的变量。如何获取坐标?

【问题讨论】:

    标签: python tensorflow bounding-box object-detection-api


    【解决方案1】:

    我尝试打印 output_dict['detection_boxes'] 但我不确定是什么 数字的意思

    您可以自己查看代码。 visualize_boxes_and_labels_on_image_array 定义为 here

    请注意,您传递的是use_normalized_coordinates=True。如果您跟踪函数调用,您将看到您的数字[ 0.56213236, 0.2780568 , 0.91445708, 0.69120586] 等是图像坐标处的值[ymin, xmin, ymax, xmax]

    (left, right, top, bottom) = (xmin * im_width, xmax * im_width, 
                                  ymin * im_height, ymax * im_height)
    

    由函数计算:

    def draw_bounding_box_on_image(image,
                               ymin,
                               xmin,
                               ymax,
                               xmax,
                               color='red',
                               thickness=4,
                               display_str_list=(),
                               use_normalized_coordinates=True):
      """Adds a bounding box to an image.
      Bounding box coordinates can be specified in either absolute (pixel) or
      normalized coordinates by setting the use_normalized_coordinates argument.
      Each string in display_str_list is displayed on a separate line above the
      bounding box in black text on a rectangle filled with the input 'color'.
      If the top of the bounding box extends to the edge of the image, the strings
      are displayed below the bounding box.
      Args:
        image: a PIL.Image object.
        ymin: ymin of bounding box.
        xmin: xmin of bounding box.
        ymax: ymax of bounding box.
        xmax: xmax of bounding box.
        color: color to draw bounding box. Default is red.
        thickness: line thickness. Default value is 4.
        display_str_list: list of strings to display in box
                          (each to be shown on its own line).
        use_normalized_coordinates: If True (default), treat coordinates
          ymin, xmin, ymax, xmax as relative to the image.  Otherwise treat
          coordinates as absolute.
      """
      draw = ImageDraw.Draw(image)
      im_width, im_height = image.size
      if use_normalized_coordinates:
        (left, right, top, bottom) = (xmin * im_width, xmax * im_width,
                                      ymin * im_height, ymax * im_height)
    

    【讨论】:

    • 好的。似乎 output_dict['detection_boxes'] 包含所有重叠的框,这就是为什么有这么多数组的原因。谢谢!
    • 是什么决定了有多少重叠框?还有为什么会有这么多重叠的框,为什么要传到可视化层去合并?
    • 我知道这是一个老问题,但我认为这可能会对某人有所帮助。如果在visualize_boxes_and_labels_on_image_array 函数输入变量中增加min_score_thresh,则可以限制重叠框的数量。默认情况下,它设置为0.5,例如,对于我的项目,我不得不将其增加到0.8
    • 标准化的 bbox 格式为 - ymin, xmin, ymax, xmax github.com/tensorflow/models/blob/…
    【解决方案2】:

    我也有同样的故事。当图像上只显示一个时,得到一个包含大约一百个框 (output_dict['detection_boxes']) 的数组。深入挖掘绘制矩形的代码能够提取并在我的inference.py中使用:

    #so detection has happened and you've got output_dict as a
    # result of your inference
    
    # then assume you've got this in your inference.py in order to draw rectangles
    vis_util.visualize_boxes_and_labels_on_image_array(
        image_np,
        output_dict['detection_boxes'],
        output_dict['detection_classes'],
        output_dict['detection_scores'],
        category_index,
        instance_masks=output_dict.get('detection_masks'),
        use_normalized_coordinates=True,
        line_thickness=8)
    
    # This is the way I'm getting my coordinates
    boxes = output_dict['detection_boxes']
    # get all boxes from an array
    max_boxes_to_draw = boxes.shape[0]
    # get scores to get a threshold
    scores = output_dict['detection_scores']
    # this is set as a default but feel free to adjust it to your needs
    min_score_thresh=.5
    # iterate over all objects found
    for i in range(min(max_boxes_to_draw, boxes.shape[0])):
        # 
        if scores is None or scores[i] > min_score_thresh:
            # boxes[i] is the box which will be drawn
            class_name = category_index[output_dict['detection_classes'][i]]['name']
            print ("This box is gonna get used", boxes[i], output_dict['detection_classes'][i])
    

    【讨论】:

      【解决方案3】:

      上面的答案对我不起作用,我不得不做一些改变。所以如果这没有帮助,不妨试试这个。

      # This is the way I'm getting my coordinates
      boxes = detections['detection_boxes'].numpy()[0]
      # get all boxes from an array
      max_boxes_to_draw = boxes.shape[0]
      # get scores to get a threshold
      scores = detections['detection_scores'].numpy()[0]
      # this is set as a default but feel free to adjust it to your needs
      min_score_thresh=.5
      # # iterate over all objects found
      coordinates = []
      for i in range(min(max_boxes_to_draw, boxes.shape[0])):
          if scores[i] > min_score_thresh:
              class_id = int(detections['detection_classes'].numpy()[0][i] + 1)
              coordinates.append({
                  "box": boxes[i],
                  "class_name": category_index[class_id]["name"],
                  "score": scores[i]
              })
      
      
      print(coordinates)
      

      这里的坐标列表中的每一项(字典)都是一个要在图像上绘制的框,带有框坐标(标准化)、class_name 和 score。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2019-10-11
        • 2020-06-23
        • 2019-09-30
        • 2019-09-18
        • 2018-04-17
        • 2018-08-17
        • 2019-04-22
        • 2020-04-03
        相关资源
        最近更新 更多