【问题标题】:Pytorch Runtime Error - The size of tensor a (5) must match the size of tensor b (3) at non-singleton dimensionPytorch 运行时错误 - 张量 a (5) 的大小必须与非单维的张量 b (3) 的大小相匹配
【发布时间】:2020-06-06 03:10:40
【问题描述】:

我正在尝试在包含用于对象检测的图像的自定义数据集上训练 Faster RCNN 网络。但是,我不想直接将 RGB 图像作为输入,我实际上需要将其与相应的热图像一起通过另一个网络(特征提取器)并将提取的特征作为 FRCNN 网络的输入。特征提取器将这两个图像组合成一个 4 通道张量,输出是一个 5 通道张量。我希望将这个 5 通道张量作为 Faster RCNN 网络的输入。

我按照 PyTorch 文档进行对象检测微调 (link here) 并提出以下代码以适合我的数据集。

class CustomDataset(torch.utils.data.Dataset):

    def __getitem__(self, idx):
        self.num_classes = 5
        img_rgb_path = os.path.join(self.root, "rgb/", self.rgb_imgs[idx])
        img_thermal_path = os.path.join(self.root, "thermal/", self.thermal_imgs[idx])


        img_rgb = Image.open(img_rgb_path)
        img_rgb = np.array(img_rgb)
        x_rgb = TF.to_tensor(img_rgb)
        x_rgb.unsqueeze_(0)

        img_thermal = Image.open(img_thermal_path)
        img_thermal = np.array(img_thermal)
        img_thermal = np.expand_dims(img_thermal,-1)
        x_th = TF.to_tensor(img_thermal)
        x_th.unsqueeze_(0)       

        print(x_rgb.shape)  # shape of [3,640,512]
        print(x_th.shape) # shape of [1,640,512]

        input = torch.cat((x_rgb,x_th),dim=1) # shape of [4,640,512]


        img = self.feature_extractor(input) #  My custom feature extractor which returns a 5 dimensional tensor

        print(img.shape) # shape of [5,640,512]



        filename = os.path.join(self.root,'annotations',self.annotations[idx])
        tree = ET.parse(filename)
        objs = tree.findall('object')

        num_objs = len(objs)
        boxes = np.zeros((num_objs, 4), dtype=np.uint16)
        labels = np.zeros((num_objs), dtype=np.float32)
        seg_areas = np.zeros((num_objs), dtype=np.float32)

        boxes = []
        for ix, obj in enumerate(objs):
            bbox = obj.find('bndbox')
            x1 = float(bbox.find('xmin').text)
            y1 = float(bbox.find('ymin').text)
            x2 = float(bbox.find('xmax').text)
            y2 = float(bbox.find('ymax').text)

            cls = self._class_to_ind[obj.find('name').text.lower().strip()]
            boxes.append([x1, y1, x2, y2])
            labels[ix] = cls
            seg_areas[ix] = (x2 - x1 + 1) * (y2 - y1 + 1)

        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        seg_areas = torch.as_tensor(seg_areas, dtype=torch.float32)
        labels = torch.as_tensor(labels, dtype=torch.float32)

        target =  {'boxes': boxes,
                'labels': labels,
                'seg_areas': seg_areas,
                }

        return img,target

我的主要功能代码如下

import utils


def train_model(model, criterion,dataloader,num_epochs):
    since = time.time()

    best_model = model
    best_acc = 0.0

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        optimizer = torch.optim.SGD(params, lr=0.005,
                            momentum=0.9, weight_decay=0.0005)


        lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                               step_size=3,
                                               gamma=0.1)

        # optimizer = lr_scheduler(optimizer, epoch)
        model.train()  # Set model to training mode

        running_loss = 0.0
        running_corrects = 0

        for data in dataloader:
            inputs, labels = data[0][0], data[1]

            inputs = inputs.to(device) 
            # zero the parameter gradients

            optimizer.zero_grad()

            # forward
            outputs = model(inputs, labels)
            _, preds = torch.max(outputs.data, 1)
            loss = criterion(outputs, labels)

            loss.backward()
            optimizer.step()


            running_loss += loss.item()
            running_corrects += torch.sum(preds == labels).item()

        epoch_loss = running_loss / len(dataloader)
        epoch_acc = running_corrects / len(dataloader)

        print('{} Loss: {:.4f} Acc: {:.4f}'.format(
            phase, epoch_loss, epoch_acc))

backbone = torchvision.models.mobilenet_v2(pretrained=True).features
backbone.out_channels = 1280

anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
                                   aspect_ratios=((0.5, 1.0, 2.0),))

roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=[0],
                                                output_size=7,
                                                sampling_ratio=2)
num_classes = 5

model = FasterRCNN(backbone = backbone,num_classes=5,rpn_anchor_generator=anchor_generator,box_roi_pool=roi_pooler)

dataset = CustomDataset('train_folder/')
data_loader_train = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=True,collate_fn=utils.collate_fn)

train_model(model, criterion, data_loader_train, num_epochs=10)

utils.py文件中定义的collat​​e_fn如下

def collate_fn(batch):
    return tuple(zip(*batch))

但是,我在训练时遇到以下错误

Traceback (most recent call last):
  File "train.py", line 147, in <module>
    train_model(model, criterion, data_loader_train, num_epochs)
  File "train.py", line 58, in train_model
    outputs = model(inputs, labels)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/generalized_rcnn.py", line 66, in forward
    images, targets = self.transform(images, targets)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/transform.py", line 46, in forward
    image = self.normalize(image)
  File "/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/transform.py", line 66, in normalize
    return (image - mean[:, None, None]) / std[:, None, None]
RuntimeError: The size of tensor a (5) must match the size of tensor b (3) at non-singleton dimension 0

我是 Pytorch 的新手。

【问题讨论】:

标签: pytorch object-detection torch torchvision


【解决方案1】:

您用于 FasterRCNN 的骨干网络是预训练的 mobilenet_v2。 网络的输入通道由输入数据的通道数决定。由于(骨干)模型是用 3 个通道 3xNxM 预训练的(在自然图像上?),因此您不能将其用于维度为 5xPxQ 的张量(跳过单例 &lt;batch_size&gt; 维度)。

基本上,您有 2 个选项,
1. 将第一个网络的输出通道维度减少到 3(如果从头开始训练会更好)
2. 为 FasterRCNN 制作一个新的主干,输入有 5 个通道,并从头开始训练。

至于解释错误信息,

return (image - mean[:, None, None]) / std[:, None, None]

Pytorch 正在尝试对输入图像进行归一化,其中输入图像的尺寸为 (5,M,N),张量 meanstd 有 3 个通道而不是 5 个

【讨论】:

  • 感谢您指出这一点。我实际上想保留模型的 5 通道输入。所以我想选择第二个选项。为此,我尝试了以下model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False) model.roi_heads.box_predictor = FastRCNNPredictor(in_features=5, num_classes) 但尽管如此,我得到了同样的错误。你能告诉我要改变什么吗?
  • 虽然您没有使用预训练模型,但您仍在使用仍需要 3 个通道输入的 resnet50 主干。您将不得不手动将骨干网的第一个 conv 支付者替换为 6 个通道。您仍然可以使用 prrtrained 模型,只是替换第一层并使其可训练。
  • 感谢您提及。我设法将输入通道更改为 3。但问题实际上出在 GeneralizedRCNN Transform() 上,它发生在图像通过模型之前。我无法理解如何修改 GeneralizedRCNN Transform() 的代码
  • 我的代码如下model = fasterrcnn_resnet50_fpn(pretrained=False)in_features = model.roi_heads.box_predictor.cls_score.in_featuresmodel.backbone.body.conv1 = Conv2d(5, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)model.roi_heads.box_predictor = FastRCNNPredictor(in_features,num_classes)
猜你喜欢
  • 1970-01-01
  • 2022-08-13
  • 2019-09-02
  • 2021-07-12
  • 2020-12-18
  • 2020-12-13
  • 1970-01-01
  • 2021-03-09
  • 2021-01-26
相关资源
最近更新 更多