正确设置 CNN 输入尺寸（Pytorch）答案

【问题标题】：Setting CNN input dimensions correctly (Pytorch)正确设置 CNN 输入尺寸（Pytorch）
【发布时间】：2022-01-19 21:59:25
【问题描述】：

我不断收到此错误：

Traceback (most recent call last):
  File "/Users/robbymoseley/Desktop/IOP/IOP-ML/Model/IOP_model.py", line 201, in <module>
    logps = model(images)
  File "/Applications/Python Virtual Environment/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Users/robbymoseley/Desktop/IOP/IOP-ML/Model/IOP_model.py", line 95, in forward
    x = self.pool(F.relu(self.conv1(x)))
  File "/Applications/Python Virtual Environment/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/Applications/Python Virtual Environment/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/Applications/Python Virtual Environment/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 396, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [6, 1, 5, 5], but got 2-dimensional input of size [10, 307200] instead

这是我正在尝试使用的模型：

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)
        # we use the maxpool multiple times, but define it once
        self.pool = nn.MaxPool2d(2, 2)
        # in_channels = 6 because self.conv1 output 6 channel
        self.conv2 = nn.Conv2d(6, 16, 5)
        # 5*5 comes from the dimension of the last convnet layer
        self.fc1 = nn.Linear(input_size, hidden_sizes[0])
        self.fc2 = nn.Linear(hidden_sizes[0], hidden_sizes[1])
        self.fc3 = nn.Linear(hidden_sizes[1], output_size)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)  # no activation on final layer
        return x

输入图像为 640x480。

这也是我的主要功能：

if __name__ == __main__:

    if torch.cuda.is_available():
        device = torch.device("cuda:0")
    else:
        device = torch.device("cpu")

    model = Net().to(device)

    criterion = nn.NLLLoss()
    images, labels = next(iter(train_loader))
    # images = images.view(images.shape[0], -1)
    print(images.shape)
    images = images.view(images.shape[0], -1)
    logps = model(images)
    loss = criterion(logps, labels)  # calculate the NLL loss

    # training process
    optimizer = optim.SGD(model.parameters(), lr=0.003, momentum=0.9)

    loss_fn = nn.NLLLoss()

    # train the model
    train_model(num_epochs, model)

    # validate the model
    validate_model(model)

    print("Execution has finished")

如何调整预期的输入尺寸或权重？或者我可以更改输入的尺寸以正确反映模型的结构吗？如果是这样，我该怎么做？

【问题讨论】：

我建议您在前向传递的每一步之后打印输入和形状的形状以进行调试

标签： python machine-learning pytorch conv-neural-network

【解决方案1】：

基于卷积部分，我确实注意到了问题，您的最终池化层输出通道计算不正确

以下是根据您提到的输入大小 640x480x1 的代码应该如何工作

import torch
import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)
        # we use the maxpool multiple times, but define it once
        self.pool = nn.MaxPool2d(2, 2)
        # in_channels = 6 because self.conv1 output 6 channel
        self.conv2 = nn.Conv2d(6, 16, 5)
        # 5*5 comes from the dimension of the last convnet layer
        self.fc1 = nn.Linear(16 * 157 * 117, 256) # You didnt provide the numbers here but i did calculate the in channels based off the prev layer
        self.fc2 = nn.Linear(256, 10)
        self.fc3 = nn.Linear(10, 2)

    def forward(self, x):
        x = self.conv1(x)
        print('Conv1 Shape: {}'.format(x.shape))
        x = self.pool(F.relu(x))
        print('Pool1 Shape: {}'.format(x.shape))
        x = self.conv2(x)
        print('Conv2 Shape: {}'.format(x.shape))
        x = self.pool(F.relu(x))
        print('Pool2 Shape: {}'.format(x.shape))
        x = x.view(-1, 16 * 157 * 117)
        print('Flatten Shape: {}'.format(x.shape))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)  # no activation on final layer
        return x
    
model = Model()

model(torch.randn(1, 1, 640, 480)) # 1 Channel since your first conv layer input inchannel is 1

每一层的输出形状

model(torch.randn(1, 1, 640, 480))
Conv1 Shape: torch.Size([1, 6, 636, 476])
Pool1 Shape: torch.Size([1, 6, 318, 238])
Conv2 Shape: torch.Size([1, 16, 314, 234])
Pool2 Shape: torch.Size([1, 16, 157, 117])
Flatten Shape: torch.Size([1, 293904])

【讨论】：