在 Mask R-CNN 中添加多个类答案

【问题标题】：Adding multiple classes in Mask R-CNN在 Mask R-CNN 中添加多个类
【发布时间】：2020-05-05 18:16:18
【问题描述】：

我正在使用 Matterport Mask RCNN 作为我的模型，并且我正在尝试构建我的数据库以进行训练。经过对以下问题的深思熟虑，我想我实际上要问的是如何添加多个类（+ BG）？

我收到以下AssertionError：

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-21-c20768952b65> in <module>()
     15 
     16   # display image with masks and bounding boxes
---> 17   display_instances(image, bbox, masks, class_ids/4, train_set.class_names)

/usr/local/lib/python3.6/dist-packages/mask_rcnn-2.1-py3.6.egg/mrcnn/visualize.py in display_instances(image, boxes, masks, class_ids, class_names, scores, title, figsize, ax, show_mask, show_bbox, colors, captions)
    103         print("\n*** No instances to display *** \n")
    104     else:
--> 105         assert boxes.shape[0] == masks.shape[-1] == class_ids.shape[0]
    106 
    107     # If no axis is passed, create one and automatically call show()

AssertionError:

问题似乎来自 mask.shape[-1] == class_ids.shape[0] 导致 False 不应该是这种情况。

我现在追溯到masks.shape[-1] 是class_id.shape[0] 值的4 倍，我认为这可能与数据中有4 个类有关。不幸的是，我还没有想出如何解决这个问题。

# load the masks for an image
def load_mask(self, image_id):
  # get details of image
  info = self.image_info[image_id]
  # define box file location
  path = info['annotation']
  # load XML
  boxes, w, h = self.extract_boxes(path)
  # create one array for all masks, each on a different channel
  masks = zeros([h, w, len(boxes)], dtype='uint8')
  # create masks
  class_ids = list()
  for i in range(len(boxes)):
    box = boxes[i]
    row_s, row_e = box[1], box[3]
    col_s, col_e = box[0], box[2]
    masks[row_s:row_e, col_s:col_e, i] = 1
    class_ids.append(self.class_names.index('Resistor'))
    class_ids.append(self.class_names.index('LED'))
    class_ids.append(self.class_names.index('Capacitor'))
    class_ids.append(self.class_names.index('Diode'))
    return masks, asarray(class_ids, dtype='int32')

# load the masks and the class ids
mask, class_ids = train_set.load_mask(image_id)
print(mask, "and", class_ids)

# display image with masks and bounding boxes
display_instances(image, bbox, mask, class_ids, train_set.class_names)

【问题讨论】：

您是否验证了masks.shape[-1] == class_ids.shape[0] 对您的输入有效？
请将您的问题减少到您作为更新提供的minimal reproducible example。调试这个小例子比调试完整代码更容易。
@IonicSolutions 感谢您的回复，对于您的第一条评论，我收到了False。为冗长的代码道歉，我会减少它（老实说，我不是 100% 确定是什么部分导致它）
不用道歉！现在你知道为什么断言失败了。您应该检查display_instances 期望mask 和class_ids 的格式。

标签： python-3.x tensorflow tensorflow-datasets transfer-learning faster-rcnn

【解决方案1】：

如果你想训练多个类，你可以使用下面的代码..

在加载数据集中，在self.add_class("class_name")中添加类，然后修改最后一行添加class_ids。 #您拥有的课程数量。

 # define classes
 self.add_class("dataset", 1, "class1name")
 self.add_class("dataset", 2, "class2name")
 # define data locations
 images_dir = dataset_dir + '/images/'
 annotations_dir = dataset_dir + '/annots/'
 # find all images
 for filename in listdir(images_dir):
     # extract image id
     image_id = filename[:-4]
     # skip bad images
     if image_id in ['00090']:
         continue
     # skip all images after 150 if we are building the train set
     if is_train and int(image_id) >= 150:
         continue
     # skip all images before 150 if we are building the test/val set
     if not is_train and int(image_id) < 150:
         continue
     img_path = images_dir + filename
     ann_path = annotations_dir + image_id + '.xml'
     # add to dataset
     self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path,class_ids=[0,1,2])

你不需要在下面的函数中修改任何东西

 def extract_boxes(self, filename):
     # load and parse the file
     tree = ElementTree.parse(filename)
     # get the root of the document
     root = tree.getroot()
     # extract each bounding box
     boxes = list()
     for box in root.findall('.//bndbox'):
         xmin = int(box.find('xmin').text)
         ymin = int(box.find('ymin').text)
         xmax = int(box.find('xmax').text)
         ymax = int(box.find('ymax').text)
         coors = [xmin, ymin, xmax, ymax]
         boxes.append(coors)
     # extract image dimensions
     width = int(root.find('.//size/width').text)
     height = int(root.find('.//size/height').text)
 return boxes, width, height

3）在下面的函数中“if i == 0”表示第一个边界框。对于多个边界框（即多个类）使用 i == 1,i == 2 .....

    # load the masks for an image
def load_mask(self, image_id):
    # get details of image
    info = self.image_info[image_id]
    # define box file location
    path = info['annotation']
    # load XML
    boxes, w, h = self.extract_boxes(path)
    # create one array for all masks, each on a different channel
    masks = zeros([h, w, len(boxes)], dtype='uint8')
    # create masks
    class_ids = list()
    for i in range(len(boxes)):
        box = boxes[i]
        row_s, row_e = box[1], box[3]
        col_s, col_e = box[0], box[2]
        # print()
        if i == 0:
            masks[row_s:row_e, col_s:col_e, i] = 1
            class_ids.append(self.class_names.index('class1name'))
        else:
            masks[row_s:row_e, col_s:col_e, i] = 2
            class_ids.append(self.class_names.index('class2name'))
    # return boxes[0],masks, asarray(class_ids, dtype='int32') to check the points
    return masks, asarray(class_ids, dtype='int32')

【讨论】：

【解决方案2】：

添加多个类需要进行一些修改：

1) 在加载数据集中，在 self.add_class("class_name") 中添加类，然后最后一行修改为添加class_ids。 #您拥有的课程数量。

# load the dataset definitions
def load_dataset(self, dataset_dir, is_train=True):
    # define one class
    self.add_class("dataset", 1, "car")
    self.add_class("dataset", 2, "rider")
    # define data locations
    images_dir = dataset_dir + '/images_mod/'
    annotations_dir = dataset_dir + '/annots_mod/'
    # find all images
    for filename in listdir(images_dir):
        # extract image id
        image_id = filename[:-4]
        # skip all images after 150 if we are building the train set
        if is_train and int(image_id) >= 3000:
            continue
        # skip all images before 150 if we are building the test/val set
        if not is_train and int(image_id) < 3000:
            continue
        img_path = images_dir + filename
        ann_path = annotations_dir + image_id + '.xml'
        # add to dataset
        self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path, class_ids=[0,1,2])

2) 现在，在提取框中，您需要修改以找到对象，然后查找名称和边界框尺寸。如果您有 2 个类并且您的 XML 文件只包含那些确切的类，那么您不需要使用 if 语句将坐标附加到框。但是，如果您想考虑与 XML 文件中可用的类相比更少的类，那么您需要添加 if 语句。否则，所有框都将被视为掩码。

# extract bounding boxes from an annotation file
def extract_boxes(self, filename):
    # load and parse the file
    tree = ElementTree.parse(filename)
    # get the root of the document
    root = tree.getroot()
    # extract each bounding box
    boxes = list()

    for box in root.findall('.//object'):
        name = box.find('name').text
        xmin = int(box.find('./bndbox/xmin').text)
        ymin = int(box.find('./bndbox/ymin').text)
        xmax = int(box.find('./bndbox/xmax').text)
        ymax = int(box.find('./bndbox/ymax').text)
        coors = [xmin, ymin, xmax, ymax, name]
        if name=='car' or name=='rider':
            boxes.append(coors)

    # extract image dimensions
    width = int(root.find('.//size/width').text)
    height = int(root.find('.//size/height').text)
    return boxes, width, height

3) 最后，在 load_mask 中，需要添加 if-else 语句以相应地附加框。

# load the masks for an image
def load_mask(self, image_id):
    # get details of image
    info = self.image_info[image_id]
    # define box file location
    path = info['annotation']
    # load XML
    boxes, w, h = self.extract_boxes(path)
    # create one array for all masks, each on a different channel
    masks = zeros([h, w, len(boxes)], dtype='uint8')
    # create masks
    class_ids = list()
    for i in range(len(boxes)):
        box = boxes[i]
        row_s, row_e = box[1], box[3]
        col_s, col_e = box[0], box[2]
        if (box[4] == 'car'):
            masks[row_s:row_e, col_s:col_e, i] = 1
            class_ids.append(self.class_names.index('car'))
        else:
            masks[row_s:row_e, col_s:col_e, i] = 2
            class_ids.append(self.class_names.index('rider'))   
    return masks, asarray(class_ids, dtype='int32')

就我而言，我需要 2 个类，并且 XML 文件中有许多可用的类。使用上面的代码，我得到了以下图像：

【讨论】：