如何使用 VGG19 迁移学习预训练答案

【问题标题】：How to use VGG19 transfer learning pretraining如何使用 VGG19 迁移学习预训练
【发布时间】：2021-04-17 18:38:08
【问题描述】：

我正在研究 VQA 模型，我需要一些帮助，因为我是新手。

我想在运行 train 之前使用 VGG19 网络的迁移学习，所以当我启动 train 时，我将拥有前面的图像特征（试图解决性能问题）。

可以这样做吗？如果是这样，有人可以用pytorch分享一个例子吗？

下面是相关代码：

class img_CNN(nn.Module):
  def __init__(self, img_size):

        super(img_CNN, self).__init__()
        self.model = models.vgg19(pretrained=True)
        self.in_features = self.model.classifier[-1].in_features
        self.model.classifier = nn.Sequential(*list(self.model.classifier.children())[:-1]) # remove vgg19 last layer
        self.fc = nn.Linear(in_features, img_size)

  def forward(self, image):
    #with torch.no_grad():
    img_feature = self.model(image) # (batch, channel, height, width)
    img_feature = self.fc(img_feature)   
    return img_feature

class vqamodel(nn.Module):
  def __init__(self, output_dim,input_dim, emb_dim, hid_dim, n_layers, dropout, answer_len, que_size, img_size,model_vgg,in_features):
    super(vqamodel,self).__init__()
    self.image=img_CNN(img_size)
    self.question=question_lstm(input_dim, emb_dim, hid_dim, n_layers, dropout,output_dim,que_size)
    self.tanh=nn.Tanh()
    self.relu=nn.ReLU()
    self.dropout=nn.Dropout(dropout)
    self.fc1=nn.Linear(que_size,answer_len) #the input to the linear network is equal to the combain vector
    self.softmax=nn.Softmax(dim=1)


  def forward(self, image, question):
    image_emb=self.image(image)
    question_emb=self.question(question) 
    combine =question_emb*image_emb
    out_feature=self.fc1(combine)
    out_feature=self.relu(out_feature)
      
    return (out_feature)

如何取出 models.vgg19(pretrained=True)，在图像数据加载器上训练之前运行它并将图像表示保存在 NumPy 数组中？

谢谢！

【问题讨论】：

标签： model pytorch transfer-learning vgg-net pre-trained-model

【解决方案1】：

是的，您可以使用预训练的 VGG 模型从图像中提取嵌入向量。这是一个可能的实现，使用torchvision.models.vgg*。

首先检索预训练模型

model = torchvision.models.vgg19(pretrained=True)

它的分类器是：

>>> model.classifier
(classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
)

根据您的微调策略，您可以截断它以保留一些经过训练的密集层：

model.classifier = nn.Sequential(*[model.classifier[i] for i in range(4)])

或者用包裹在nn.Sequential中的一组不同的密集层完全替换它：

model.classifier = nn.Sequential(
    nn.Linear(25088, 4096),
    nn.ReLU(True),
    nn.Dropout(0.5),
    nn.Linear(4096, 2048))

此外，您可以冻结模型的整个头部（特征提取器）：

for param in model.features.parameters():
    param.requires_grad = False

然后您将能够使用该模型来提取图像嵌入并执行反向传播以微调您的分类器：
```
>>> model(img) # shape (batchs_size, 2048) 
```

【讨论】：

谢谢。如果我在使用整个 VQA 进行训练时运行模型分类器会怎样？可能吗？上面的哪个命令实际上将图像作为特征？模型（图片）？我打算用 VQA 模型进行反向传播……有可能吗？
是的，这是可能的，正如我所说，您可以根据需要选择更新分类器或不更新分类器。 model(img) 将输出嵌入，反向传播将自然工作，因为您的模型将取决于 VGG 的输出。
如果我不做任何更改地运行它，我会得到最后一层 (batch_size, 1000)。使用上面的分类器示例，您实际上更改了最后一层？
是的，您可能想要更改预训练模型的最后一段（即分类器）。默认情况下，最后一层对应于 logit 输出，有 1000 个类。
非常感谢您的帮助！我现在明白了:)