Key question

  • Vanishing/exploding gradients hamper convergence from the beginning, as the network becomes more deeper.
  • with the network depth increasing, accuracy gets saturated (which might be unsurprising) and then degrades rapidly.
    ResNet--Deep Residual Learning for Image Recognition

Methods

  • skip connections
    ResNet--Deep Residual Learning for Image Recognition
  • The form of the residual function F is flexible
    ResNet--Deep Residual Learning for Image Recognition
  • The function F(x; fWig) can represent multiple convolutional layers. The element-wise addition is performed on two feature maps, channel by channel.

Architecture

  • ResNet--Deep Residual Learning for Image Recognition

  • Architectures for ImageNet
    ResNet--Deep Residual Learning for Image Recognition

Experiments

  • ResNet--Deep Residual Learning for Image Recognition
    (Training on ImageNet. Thin curves denote training error, and bold curves denote validation error of the center crops. Left: plain networks of 18 and 34 layers. Right: ResNets of 18 and 34 layers. In this plot, the residual networks have no extra parameter compared to their plain counterparts.)

  • ResNet--Deep Residual Learning for Image Recognition

  • ResNet--Deep Residual Learning for Image Recognition

  • ResNet--Deep Residual Learning for Image Recognition

  • ResNet--Deep Residual Learning for Image Recognition

相关文章: