【问题标题】:Resnet-18 as backbone in Faster R-CNNResnet-18 作为 Faster R-CNN 中的主干
【发布时间】:2020-02-10 06:12:37
【问题描述】:

我使用 pytorch 进行编码,我想使用 resnet-18 作为 Faster R-RCNN 的主干。当我打印resnet18 的结构时,输出如下:

>>import torch
>>import torchvision
>>import numpy as np
>>import torchvision.models as models

>>resnet18 = models.resnet18(pretrained=False)
>>print(resnet18)


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=512, out_features=1000, bias=True)
)

我的问题是,直到哪一层是特征提取器? AdaptiveAvgPool2d 应该是 Faster R-CNN 主干的一部分吗?

this toturial 中,展示了如何训练具有任意主干的 Mask R-CNN,我想用 Faster R-CNN 做同样的事情,并用 resnet-18 训练一个 Faster R-CNN,但直到层应该是特征提取器的一部分让我感到困惑。

我知道如何使用 resnet+Feature Pyramid Network 作为主干,我的问题是关于 resent。

【问题讨论】:

    标签: neural-network deep-learning pytorch resnet faster-rcnn


    【解决方案1】:

    我在新版本的 torch 和 torchvision 中使用了类似的东西。

    def get_resnet18_backbone_model(num_classes, pretrained):
        from torchvision.models.detection.backbone_utils import resnet_fpn_backbone
    
        print('Using fasterrcnn with res18 backbone...')
    
        backbone = resnet_fpn_backbone('resnet18', pretrained=pretrained, trainable_layers=5)
    
        anchor_generator = AnchorGenerator(
            sizes=((16,), (32,), (64,), (128,), (256,)),
            aspect_ratios=tuple([(0.25, 0.5, 1.0, 2.0) for _ in range(5)]))
    
        roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0', '1', '2', '3'],
                                                        output_size=7, sampling_ratio=2)
    
        # put the pieces together inside a FasterRCNN model
        model = FasterRCNN(backbone, num_classes=num_classes,
                           rpn_anchor_generator=anchor_generator,
                           box_roi_pool=roi_pooler)
        return model
    

    请注意 resnet_fpn_backbone() 已经将主干.out_channels 设置为正确的值。

    【讨论】:

      【解决方案2】:

      如果我们想使用自适应平均池的输出,我们将这段代码用于不同的 Resnet:

      # backbone
              if backbone_name == 'resnet_18':
                  resnet_net = torchvision.models.resnet18(pretrained=True)
                  modules = list(resnet_net.children())[:-1]
                  backbone = nn.Sequential(*modules)
                  backbone.out_channels = 512
              elif backbone_name == 'resnet_34':
                  resnet_net = torchvision.models.resnet34(pretrained=True)
                  modules = list(resnet_net.children())[:-1]
                  backbone = nn.Sequential(*modules)
                  backbone.out_channels = 512
              elif backbone_name == 'resnet_50':
                  resnet_net = torchvision.models.resnet50(pretrained=True)
                  modules = list(resnet_net.children())[:-1]
                  backbone = nn.Sequential(*modules)
                  backbone.out_channels = 2048
              elif backbone_name == 'resnet_101':
                  resnet_net = torchvision.models.resnet101(pretrained=True)
                  modules = list(resnet_net.children())[:-1]
                  backbone = nn.Sequential(*modules)
                  backbone.out_channels = 2048
              elif backbone_name == 'resnet_152':
                  resnet_net = torchvision.models.resnet152(pretrained=True)
                  modules = list(resnet_net.children())[:-1]
                  backbone = nn.Sequential(*modules)
                  backbone.out_channels = 2048
              elif backbone_name == 'resnet_50_modified_stride_1':
                  resnet_net = resnet50(pretrained=True)
                  modules = list(resnet_net.children())[:-1]
                  backbone = nn.Sequential(*modules)
                  backbone.out_channels = 2048
              elif backbone_name == 'resnext101_32x8d':
                  resnet_net = torchvision.models.resnext101_32x8d(pretrained=True)
                  modules = list(resnet_net.children())[:-1]
                  backbone = nn.Sequential(*modules)
                  backbone.out_channels = 2048
      

      如果我们想使用卷积特征图,我们使用以下代码:

       # backbone
              if backbone_name == 'resnet_18':
                  resnet_net = torchvision.models.resnet18(pretrained=True)
                  modules = list(resnet_net.children())[:-2]
                  backbone = nn.Sequential(*modules)
      
              elif backbone_name == 'resnet_34':
                  resnet_net = torchvision.models.resnet34(pretrained=True)
                  modules = list(resnet_net.children())[:-2]
                  backbone = nn.Sequential(*modules)
      
              elif backbone_name == 'resnet_50':
                  resnet_net = torchvision.models.resnet50(pretrained=True)
                  modules = list(resnet_net.children())[:-2]
                  backbone = nn.Sequential(*modules)
      
              elif backbone_name == 'resnet_101':
                  resnet_net = torchvision.models.resnet101(pretrained=True)
                  modules = list(resnet_net.children())[:-2]
                  backbone = nn.Sequential(*modules)
      
              elif backbone_name == 'resnet_152':
                  resnet_net = torchvision.models.resnet152(pretrained=True)
                  modules = list(resnet_net.children())[:-2]
                  backbone = nn.Sequential(*modules)
      
              elif backbone_name == 'resnet_50_modified_stride_1':
                  resnet_net = resnet50(pretrained=True)
                  modules = list(resnet_net.children())[:-2]
                  backbone = nn.Sequential(*modules)
      
              elif backbone_name == 'resnext101_32x8d':
                  resnet_net = torchvision.models.resnext101_32x8d(pretrained=True)
                  modules = list(resnet_net.children())[:-2]
                  backbone = nn.Sequential(*modules)
      

      【讨论】:

        【解决方案3】:

        torchvision 自动获取 vgg 和 mobilenet 的特征提取层。 .features 自动从主干模型中提取出所需的相关层,并将其传递到对象检测管道。您可以在 resnet_fpn_backbone 函数中了解更多相关信息。

        在您分享的object detection link中,您只需将backbone = torchvision.models.mobilenet_v2(pretrained=True).features更改为backbone = resnet_fpn_backbone('resnet50', pretrained_backbone)即可。

        只是为了让您简要了解,resnet_fpn_backbone 函数利用您提供的 resnet 主干名称(18、34、50 ...)instantiate retinanet 并使用forward 提取第 1 层到第 4 层。这个带有 FPN 的主干将在 faster RCNN 中用作主干。

        【讨论】:

        • 我测试了 resnet18(pretrained=True).features 但它给出了:AttributeError: 'ResNet' object has no attribute 'features'
        • 你能不能把backbone = torchvision.models.resnet18(pretrained=True).features替换成backbone = resnet_fpn_backbone('resnet50', pretrained_backbone)你需要包括from .backbone_utils import resnet_fpn_backbone
        • resnet_fpn_backbone 返回基于 resnet 的 Feature Pyramid Network,我希望 resnet 作为主干。
        • 据我所知,torchvision 目前默认支持带 fpn 的 resnet。如果需要,您可以通过拉取他们的 repo 来进行自定义。我会再做一次研究,我会确认的。
        • 我认为这段代码解决了它: [code] resnet_net = torchvision.models.resnet18(pretrained=True) modules = list(resnet_net.children())[:-2] bone = nn.Sequential (*modules) 主干.out_channels = 512 [\code]
        猜你喜欢
        • 1970-01-01
        • 2021-03-26
        • 2018-01-03
        • 1970-01-01
        • 1970-01-01
        • 2023-02-20
        • 2018-02-24
        • 2020-08-19
        • 2020-05-06
        相关资源
        最近更新 更多