conv2d后的PyTorch CNN线性层形状[重复]答案

【问题标题】：PyTorch CNN linear layer shape after conv2d [duplicate]conv2d后的PyTorch CNN线性层形状[重复]
【发布时间】：2021-05-05 00:09:43
【问题描述】：

我正在尝试学习 PyTorch，偶然发现了一个 CNN 定义如下的教程，

class Net(Module):   
    def __init__(self):
        super(Net, self).__init__()

        self.cnn_layers = Sequential(
            # Defining a 2D convolution layer
            Conv2d(1, 4, kernel_size=3, stride=1, padding=1),
            BatchNorm2d(4),
            ReLU(inplace=True),
            MaxPool2d(kernel_size=2, stride=2),
            # Defining another 2D convolution layer
            Conv2d(4, 4, kernel_size=3, stride=1, padding=1),
            BatchNorm2d(4),
            ReLU(inplace=True),
            MaxPool2d(kernel_size=2, stride=2),
        )

        self.linear_layers = Sequential(
            Linear(4 * 7 * 7, 10)
        )

    # Defining the forward pass    
    def forward(self, x):
        x = self.cnn_layers(x)
        x = x.view(x.size(0), -1)
        x = self.linear_layers(x)
        return x

我了解 cnn_layers 是如何制作的。在 cnn_layers 之后，数据应该被展平并交给 linear_layers。

我不明白 Linear 的特征数量是多少 4*7*7。我知道 4 是最后一个 Conv2d 层的输出维度。

7*7 是如何进入图片的？ stride 或 padding 在其中有什么作用吗？

输入图像形状为 [1, 28, 28]

【问题讨论】：

输入图像的形状是什么？
28x28 单通道
据我了解，全连接层中的神经元数量不需要与卷积层的输出相同。如果你在第一个 FC 层有 30 个神经元并且 conv 层的输出是 4*7*7 你仍然可以有 30,10 10 是输出维度，30 是隐藏层维度，可以有任意数量的输入.
@cerofrais 不，这不是它的工作原理

标签： python pytorch torch

【解决方案1】：

Conv2d 层的内核大小为 3，步幅和填充为 1，这意味着它不会改变图像的空间大小。有两个MaxPool2d 层将空间维度从(H, W) 减少到(H/2, W/2)。因此，对于每个批次，具有 4 个输出通道的最后一个卷积的输出具有(batch_size, 4, H/4, W/4) 的形状。在前向传递特征中，张量被x = x.view(x.size(0), -1) 展平，使其形状为(batch_size, H*W/4)。我假设 H 和 W 为 28，线性层将采用形状为 (batch_size, 196) 的输入。

【讨论】：

知道了。谢谢..

【解决方案2】：

其实，在 2D 卷积层中，特征 [values] 在矩阵 [2D-tensor] 中，像往常一样，神经网络最终会得到一个全连接层，然后是logistic。因此，向量 [1D-tensor] 中的全连接层中的特征。因此，我们必须将最后一个度量中的每个特征 [值] 映射到下面的全连接层。在 pytorch 中，全连接层的实现是 Linear 类。第一个参数是输入特征的数量：在这种情况下

input_image : (28,28,1)
after_Conv2d_1 : (28,28,4) <- because of the padding : if padding := 0 then (26,26,1)
after_maxPool_1 : (14,14,4) <- due to the stride of 2
after_Conv2D_2 : (14,14,4) <- because this is "same" padding
after_maxPool_2 : (7,7,4)

最终，全连接层之前的特征总数为4*7*7。另外，这里说明了为什么我们使用奇数作为内核大小并从具有偶数像素的图像开始

【讨论】：