我应该在 Pytorch 的以下架构中将输入图像尺寸放在哪里？答案

【问题标题】：Where should i put the input image dimesions in the following architecture in Pytorch?我应该在 Pytorch 的以下架构中将输入图像尺寸放在哪里？
【发布时间】：2021-03-08 08:08:53
【问题描述】：

class Discriminator(nn.Module):
def __init__(self, channels=3):
    super(Discriminator, self).__init__()
    
    self.channels = channels

    def convlayer(n_input, n_output, k_size=4, stride=2, padding=0, bn=False):
        block = [nn.Conv2d(n_input, n_output, kernel_size=k_size, stride=stride, padding=padding, bias=False)]
        if bn:
            block.append(nn.BatchNorm2d(n_output))
        block.append(nn.LeakyReLU(0.2, inplace=True))
        return block

    self.model = nn.Sequential(
        *convlayer(self.channels, 32, 4, 2, 1),
        *convlayer(32, 64, 4, 2, 1),
        *convlayer(64, 128, 4, 2, 1, bn=True),
        *convlayer(128, 256, 4, 2, 1, bn=True),
        nn.Conv2d(256, 1, 4, 1, 0, bias=False),  # FC with Conv.
    )

def forward(self, imgs):
    logits = self.model(imgs)
    out = torch.sigmoid(logits)

    return out.view(-1,1)

上面的架构是GAN模型的判别器，我和第一层一样有点困惑

*convlayer(self.channels, 32, 4, 2, 1)

self.channels ，即 3（彩色图像）已通过，我有一个 64 * 64 * 3 的输入图像。我的第一个问题是在上述架构中输入图像的尺寸在哪里得到注意?

我之所以感到困惑，是因为当我看到生成器架构时，

class Generator(nn.Module):
def __init__(self, nz=128, channels=3):
    super(Generator, self).__init__()
    
    self.nz = nz
    self.channels = channels
    
    def convlayer(n_input, n_output, k_size=4, stride=2, padding=0):
        block = [
            nn.ConvTranspose2d(n_input, n_output, kernel_size=k_size, stride=stride, padding=padding, bias=False),
            nn.BatchNorm2d(n_output),
            nn.ReLU(inplace=True),
        ]
        return block

    self.model = nn.Sequential(
        *convlayer(self.nz, 1024, 4, 1, 0), # Fully connected layer via convolution.
        *convlayer(1024, 512, 4, 2, 1),
        *convlayer(512, 256, 4, 2, 1),
        *convlayer(256, 128, 4, 2, 1),
        *convlayer(128, 64, 4, 2, 1),
        nn.ConvTranspose2d(64, self.channels, 3, 1, 1),

        nn.Tanh()
    )

def forward(self, z):
    z = z.view(-1, self.nz, 1, 1)
    img = self.model(z)
    return img

在第一层

*convlayer(self.nz, 1024, 4, 1, 0)

他们正在传递 self.nz ，这是生成 64 * 64 * 3 图像所需的 128 个随机潜在点，这与上面传递通道的鉴别器模型相反。

我的第二个问题是，如果我有一个 300 * 300 * 3 的图像，我应该改变我的鉴别器架构来处理图像吗？

附：我是 Pytorch 的新手。

【问题讨论】：

标签： python-3.x deep-learning pytorch generative-adversarial-network

【解决方案1】：

卷积根本不需要输入图像的尺寸。您要做的就是在图像上执行内核卷积（有/无）跨步。您只需确保卷积层的输入大小大于该层内核的大小。例如：您不能在 2x2 图像上应用 3x3 内核。当然，你可以通过 padding 来解决这个问题，但一般来说是不可能的。

鉴别器将从您的数据集或生成器生成的数据集中抽取一个样本，以评估它是真还是假。由于这是 CNN 而不是线性层网络，因此您无需指定输入图像的大小。
生成器将从潜在点中采样，然后生成图像。如果您有300x300 图像，则无需对鉴别器进行任何更改。

【讨论】：