SPP-Net学习浅显记录

写在前面的话：文章内容来源于但不限于网络、书籍、个人心得体会等，意在总结和方便各位同行快速参考，共同学习进步，记录自己的问题。错误在所难免，有请各位批评斧正。如有侵权，烦请第一时间通知，我会立即删除相关内容，万分感谢！

SPP-Net学习浅显记录

Cropping may loss some information about the object

截取的区域未涵盖整个目标

Warpping may change the object’s appearance

缩放带来图像的扭曲

FC layer need a fixed-lenground truthh input while conv layer can be adapted to arbitrary input size.事实上，CNN的卷积层不需要固定尺寸的图像，而全连接层是需要固定大小输入的

提出了SPP层放到卷积层的后面

SPPNet将任意大小的图像池化生成固定长度的图像表示

SPP-Net学习浅显记录

SPP-Net: Training for Detection(1)

Step1. Generate a image pyramid and exact the conv

FeatMap of the whole image

金字塔用了{6x6 3x3 2x2 1x1}，共36+9+4+1=50个特征

做出的主要改进在于SPP-net能够一次得到整个feature map，大大减少了计算proposal的特征时候的运算开销。

具体做法，将图片缩放到s∈{480,576,688,864,1200}的大小，尽量让region在s集合中对应的尺度接近224x224，然后选择对应的feature map进行提取

空间金字塔池化层{6x6 3x3 2x2 1x1} s∈{480,576,688,864,1200} 224x2 224x3 224x6

SPP-Net学习浅显记录

SPP-Net: Training for Detection(2)

Step 2, For each proposal, walking the image pyramid and find a project version that has a number of pixels closest to 224x224. (For scaling invariance in training.)

Step 3, find the corresponding FeatMap in Conv5 and use SPP layer to pool it to a fix size.

Step 4, While getting all the proposals’ feature, fine-tune the FC layer only.

Step 5, Train the class-specified SVM

SPP-Net学习浅显记录

SPP-Net: Training for Detection：

SPP-Net学习浅显记录

Almost the same as R-CNN, except Step3.

SPP-Net学习浅显记录

SPP是BOW的扩展，将图像从精细空间划分到粗糙空间，之后将局部特征聚集。在CNN成为主流之前，SPP在检测和分类的应用比较广泛。

SPP的优点：1）任意尺寸输入，固定大小输出；2）层多；3）可对任意尺度提取的特征进行池化。

SPP-Net速度提升：

Speed: 64x faster than R-CNN using one scale, and 24x faster using five-scale paramid. mAP: +1.2

mAP vs R-CNN

SPP-Net学习浅显记录

SPP-Net: 不足:

1. 训练分多阶段，并不是端到端的训练过程

SPP-Net学习浅显记录

2. 训练花费过大的硬盘开销和时间

SPP-Net学习浅显记录

3. 训练sppnet只微调全连阶层（检测除了语义信息还需要位置信息，多层pooling操作导致位置信息模糊）