使用 pred 框坐标裁剪图像答案

【问题标题】：Crop image using pred boxes coordinates使用 pred 框坐标裁剪图像
【发布时间】：2021-09-01 15:26:16
【问题描述】：

我使用detectron2 来预测对象在图像中的位置。现在我正在尝试使用预测框来裁剪图像（在我的用例中，每个图像仅检测到 1 个对象/框）。与我的问题相关的代码部分如下。问题是它只裁剪图像的左侧，但我需要它（显然）裁剪顶部、右侧和底部，以便裁剪到检测到的对象的形状。原始图像的形状为 (x, y, 3)，因此它们是 RGB 图像。我错过了什么？

from detectron2.utils.visualizer import ColorMode
import glob

imageName = "my_img.jpg"
im = cv2.imread(imageName)
outputs = predictor(im)
v = Visualizer(im[:, :, ::-1], metadata=test_metadata, scale=0.8)
out = v.draw_instance_predictions(outputs["instances"].to("cpu"))
cv2_imshow(out.get_image()[:, :, ::-1])

boxes = outputs["instances"].pred_boxes
boxes = list(boxes)[0].detach().cpu().numpy()

# extract the bounding box coordinates
(x, y) = (int(boxes[0]), int(boxes[1]))
(w, h) = (int(boxes[2]), int(boxes[3]))
crop_img = image[x:y+h, y:x+w]
cv2_imshow(crop_img)

我也尝试了以下方法，但它从顶部修剪了太多图像，根本没有修剪图像的右侧或底部。

from detectron2.data.transforms import CropTransform

ct = CropTransform(x, y, w, h)
crop_img = ct.apply_image(image)
cv2_imshow(crop_img)

玩弄它，我能够使用以下内容裁剪检测到的框周围的图像，但这并不理想，因为我必须对其进行硬编码。

crop_img = image[y-40:y+h-390, x:x+w-395]

【问题讨论】：

应该是crop_img = image[x:x+w, y:y+h]。
@TimRoberts 实际上比左边和右边的框裁剪得更多。并且根本不会裁剪图像的顶部/底部。
当然。傻我。 OpenCV 产生的数组是 Y 主要的。 crop_img = image[y:y+h, x:x+w].
@TimRoberts 我仍然遗漏了一些东西，因为image[y:y+h, x:x+w] 正确裁剪了左侧，顶部太多（实际上是在框下裁剪）并且根本没有裁剪右侧或底部。有什么想法吗？
@TimRoberts 我在玩它，以下让我非常接近我想要的，检测到的框周围的裁剪图像：crop_img = image[y-40:y+h-390, x:x+w-395]。但我希望能够在没有所有硬编码值的情况下做到这一点。我不确定我错过了什么，但如果你有任何想法，我很想听听。

标签： python image crop detectron

【解决方案1】：

以下应该可以工作。

def crop_object(image, box):
  """Crops an object in an image

  Inputs:
    image: PIL image
    box: one box from Detectron2 pred_boxes
  """

  x_top_left = box[0]
  y_top_left = box[1]
  x_bottom_right = box[2]
  y_bottom_right = box[3]
  x_center = (x_top_left + x_bottom_right) / 2
  y_center = (y_top_left + y_bottom_right) / 2

  crop_img = image.crop((int(x_top_left), int(y_top_left), int(x_bottom_right), int(y_bottom_right)))
  return crop_img

# Get pred_boxes from Detectron2 prediction outputs
boxes = outputs["instances"].pred_boxes
# Select 1 box:
box = list(boxes)[0].detach().cpu().numpy()
# Crop the PIL image using predicted box coordinates
crop_img = crop_object(image, box)

【讨论】：