Pytorch迁移学习错误：张量a（16）的大小必须与非单维2的张量b（128）的大小相匹配答案

【问题标题】：Pytorch transfer learning error: The size of tensor a (16) must match the size of tensor b (128) at non-singleton dimension 2Pytorch迁移学习错误：张量a（16）的大小必须与非单维2的张量b（128）的大小相匹配
【发布时间】：2021-05-13 13:05:31
【问题描述】：

目前，我正在研究 PyTorch 的图像运动去模糊问题。我有两种图像：作为输入图像的模糊图像（变量 = blur_image）和相同图像的清晰版本（变量 = shar_image），应该是输出。现在我想尝试迁移学习，但我无法让它发挥作用。

这是我的数据加载器的代码：

train_loader = torch.utils.data.DataLoader(train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle = True)
validation_loader = torch.utils.data.DataLoader(valid_dataset, 
                                                batch_size=batch_size,
                                                shuffle = False)
test_loader = torch.utils.data.DataLoader(test_dataset, 
                                          batch_size=batch_size,
                                          shuffle = False)

它们的形状：

Trainloader - Shape of blur_image [N, C, H, W]:  torch.Size([16, 3, 128, 128])
Trainloader - Shape of sharp_image [N, C, H, W]:  torch.Size([16, 3, 128, 128]) torch.float32
Validationloader - Shape of blur_image [N, C, H, W]:  torch.Size([16, 3, 128, 128])
Validationloader - Shape of sharp_image [N, C, H, W]:  torch.Size([16, 3, 128, 128]) torch.float32
Testloader- Shape of blur_image [N, C, H, W]:  torch.Size([16, 3, 128, 128])
Testloader- Shape of sharp_image [N, C, H, W]:  torch.Size([16, 3, 128, 128]) torch.float32

我使用迁移学习的方式（我认为对于“in_features”我必须输入像素数量）：

model = models.alexnet(pretrained=True)
model.classifier[6] = torch.nn.Linear(model.classifier[6].in_features, 128)
device_string = "cuda" if torch.cuda.is_available() else "cpu"
device = torch.device(device_string)
model = model.to(device)

我定义训练过程的方式：

# Define the loss function (MSE was chosen due to the comparsion of pixels
# between blurred and sharp images
criterion = nn.MSELoss()

# Define the optimizer and learning rate
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Learning rate schedule - If the loss value does not improve after 5 epochs
# back-to-back then the new learning rate will be:  previous_rate*0.5

#scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau( 
        optimizer,
        mode='min',
        patience=5,
        factor=0.5,
        verbose=True
    )

def training(model, trainDataloader, epoch):
  """ Function to define the model training
  
    Args:
        model (Model object): The model that is going to be trained.
        trainDataloader (Dataloader object): Dataloader object of the trainset.
        epoch (Integer): Number of training epochs.
  
  """
  # Changing model into trainings mode
  model.train()
  # Supporting variable to display the loss for each epoch
  running_loss = 0.0
  running_psnr = 0.0
  for i, data in tqdm(enumerate(trainDataloader), 
                      total=int(len(train_dataset)/trainDataloader.batch_size)):
    blur_image = data[0]
    sharp_image = data[1]
        
    # Transfer the blurred and sharp image instance to the device
    blur_image = blur_image.to(device)
    sharp_image = sharp_image.to(device)

    # Sets the gradient of tensors to zero
    optimizer.zero_grad()
    outputs = model(blur_image)
    loss = criterion(outputs, sharp_image)

    # Perform backpropagation
    loss.backward()
    # Update the weights 
    optimizer.step()

    # Add the loss that was calculated during the trainigs run
    running_loss += loss.item()

    # calculate batch psnr (once every `batch_size` iterations)
    batch_psnr =  psnr(sharp_image, blur_image)
    running_psnr += batch_psnr

  # Display trainings loss
  trainings_loss = running_loss/len(trainDataloader.dataset)
  final_psnr = running_psnr/int(len(train_dataset)/trainDataloader.batch_size)
  final_ssim = ssim(sharp_image, blur_image, data_range=1, size_average=True)
  print(f"Trainings loss: {trainings_loss:.5f}")
  print(f"Train PSNR: {final_psnr:.5f}")
  print(f"Train SSIM: {final_ssim:.5f}")

  return trainings_loss, final_psnr, final_ssim

这是我开始训练的方式：

train_loss  = []
val_loss = []
train_PSNR_score  = []
train_SSIM_score  = []
val_PSNR_score  = []
val_SSIM_score  = []

start = time.time()
for epoch in range(nb_epochs):
    print(f"Epoch {epoch+1}\n-------------------------------")
    train_epoch_loss = training(model, train_loader, nb_epochs)
    val_epoch_loss = validation(model, validation_loader, nb_epochs)
    train_loss.append(train_epoch_loss[0])
    val_loss.append(val_epoch_loss[0])

    train_PSNR_score.append(train_epoch_loss[1])
    train_SSIM_score.append(train_epoch_loss[2])

    val_PSNR_score.append(val_epoch_loss[1])
    val_SSIM_score.append(val_epoch_loss[2])

    scheduler.step(train_epoch_loss[0])
    scheduler.step(val_epoch_loss[0])
end = time.time()
print(f"Took {((end-start)/60):.3f} minutes to train")

但是每次我想进行训练时都会收到以下错误：

 0%|          | 0/249 [00:00<?, ?it/s]Epoch 1
-------------------------------
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/loss.py:528: UserWarning: Using a target size (torch.Size([16, 3, 128, 128])) that is different to the input size (torch.Size([16, 128])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  return F.mse_loss(input, target, reduction=self.reduction)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-195-ff0214e227cd> in <module>()
      9 for epoch in range(nb_epochs):
     10     print(f"Epoch {epoch+1}\n-------------------------------")
---> 11     train_epoch_loss = training(model, train_loader, nb_epochs)
     12     val_epoch_loss = validation(model, validation_loader, nb_epochs)
     13     train_loss.append(train_epoch_loss[0])

<ipython-input-170-dfa2c212ad23> in training(model, trainDataloader, epoch)
     25     optimizer.zero_grad()
     26     outputs = model(blur_image)
---> 27     loss = criterion(outputs, sharp_image)
     28 
     29     # Perform backpropagation

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/loss.py in forward(self, input, target)
    526 
    527     def forward(self, input: Tensor, target: Tensor) -> Tensor:
--> 528         return F.mse_loss(input, target, reduction=self.reduction)
    529 
    530 

/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in mse_loss(input, target, size_average, reduce, reduction)
   2926         reduction = _Reduction.legacy_get_string(size_average, reduce)
   2927 
-> 2928     expanded_input, expanded_target = torch.broadcast_tensors(input, target)
   2929     return torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
   2930 

/usr/local/lib/python3.7/dist-packages/torch/functional.py in broadcast_tensors(*tensors)
     72     if has_torch_function(tensors):
     73         return handle_torch_function(broadcast_tensors, tensors, *tensors)
---> 74     return _VF.broadcast_tensors(tensors)  # type: ignore
     75 
     76 

RuntimeError: The size of tensor a (16) must match the size of tensor b (128) at non-singleton dimension 2

模型结构：

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace=True)
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=4096, out_features=128, bias=True)
  )
)

我是使用 Pytorch（以及一般的图像去模糊）的新手，所以我对错误消息的含义以及如何修复它感到困惑。我试图改变我的参数，但没有任何效果。有人对我如何解决这个问题有任何建议吗？

我会很感激每一个输入 :)

【问题讨论】：

你的清晰图像大小和输出大小应该匹配。这个错误告诉你标准不能比较两个不同大小的输入。
trainDataLoader 返回什么？
我们需要查看model的结构。看起来您的模型输出的尺寸与您预期的不同
@Niro：你完全正确！形状不匹配。我用sharp_image做了一个实验，让它也通过模型（output_sharp_image = model（sharp_image），这样两个张量都匹配。现在我只是有一个问题，我认为我不能让我的sharp模型运行通过模型（至少与 output = blur_image 不同）
@Niro：trainDataLoader 只是返回成批的图像，以便我可以遍历它们（你问是因为训练功能吗？）

标签： python image-processing pytorch tensor motion-blur

【解决方案1】：

在这里你不能使用alexnet 来完成这个任务。因为您的模型和sharp_image 的输出应该是可耻的。因为convnet 将您的图像编码为嵌入您并且完全连接的层无法将这些图像转换为其正常大小您不能使用完全连接的层进行解码，为了获得相同的大小，您需要使用ConvTranspose2d() 完成此任务。

您的编码器应该是：

class ConvEncoder(nn.Module):
    """
    A simple Convolutional Encoder Model
    """

    def __init__(self):
        super().__init__()

        self.conv1 = nn.Conv2d(3, 16, (3, 3), padding=(1, 1))
        self.relu1 = nn.ReLU(inplace=True)
        self.maxpool1 = nn.MaxPool2d((2, 2))

        self.conv2 = nn.Conv2d(16, 32, (3, 3), padding=(1, 1))
        self.relu2 = nn.ReLU(inplace=True)
        self.maxpool2 = nn.MaxPool2d((2, 2))

        self.conv3 = nn.Conv2d(32, 64, (3, 3), padding=(1, 1))
        self.relu3 = nn.ReLU(inplace=True)
        self.maxpool3 = nn.MaxPool2d((2, 2))

        self.conv4 = nn.Conv2d(64, 128, (3, 3), padding=(1, 1))
        self.relu4 = nn.ReLU(inplace=True)
        self.maxpool4 = nn.MaxPool2d((2, 2))


    def forward(self, x):
        # Downscale the image with conv maxpool etc.
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.maxpool1(x)

        x = self.conv2(x)
        x = self.relu2(x)
        x = self.maxpool2(x)

        x = self.conv3(x)
        x = self.relu3(x)
        x = self.maxpool3(x)

        x = self.conv4(x)
        x = self.relu4(x)
        x = self.maxpool4(x)

        
        return x

你的解码器应该是：

class ConvDecoder(nn.Module):
    """
    A simple Convolutional Decoder Model
    """

    def __init__(self):
        super().__init__()
        self.deconv1 = nn.ConvTranspose2d(256, 128, (2, 2), stride=(2, 2))
        self.relu1 = nn.ReLU(inplace=True)

        self.deconv2 = nn.ConvTranspose2d(128, 64, (2, 2), stride=(2, 2))
        self.relu2 = nn.ReLU(inplace=True)

        self.deconv3 = nn.ConvTranspose2d(64, 32, (2, 2), stride=(2, 2))
        self.relu3 = nn.ReLU(inplace=True)

        self.deconv4 = nn.ConvTranspose2d(32, 16, (2, 2), stride=(2, 2))
        self.relu4 = nn.ReLU(inplace=True)

    
    def forward(self, x):
         # Upscale the image with convtranspose etc.
        x = self.deconv1(x)
        x = self.relu1(x)

        x = self.deconv2(x)
        x = self.relu2(x)

        x = self.deconv3(x)
        x = self.relu3(x)

        x = self.deconv4(x)
        x = self.relu4(x)
        return x

encoder = ConvEncoder()
decoder = ConvDecoder()

你可以这样训练你的模型：

    encoder.train()
    decoder.train()

    for batch_idx, (train_img, target_img) in enumerate(train_loader):
        # Move images to device
        train_img = train_img.to(device)
        target_img = target_img.to(device)
        
        # Zero grad the optimizer
        optimizer.zero_grad()
        # Feed the train images to encoder
        enc_output = encoder(train_img)
        # The output of encoder is input to decoder !
        dec_output = decoder(enc_output)
        
        # Decoder output is reconstructed image
        # Compute loss with it and orginal image which is target image.
        loss = loss_fn(dec_output, target_img)
        # Backpropogate
        loss.backward()
        # Apply the optimizer to network by calling step.
        optimizer.step()
    # Return the loss
    return loss.item()

您可能需要访问 this 以获取有关您的项目的帮助。

【讨论】：

非常感谢您的帮助！您的解决方案和解释完全有道理。我唯一的问题是我必须使用迁移学习来解决这个问题（这是这个项目的要求之一）。但它并不一定要一个特定的预训练模型（所以我将丢弃 alexnet 的一个）。您可能知道任何可以使用的模型吗？
@Ruffybeo ，对于这个任务，如果你想使用预训练模型，那么你需要使用使用 srcnn 神经网络的模型。你应该访问this link。
谢谢你的链接:)。我确实尝试了 SRCNN，它奏效了！我会把你的答案标记为正确的。非常感谢您的帮助和时间！