文章发表在CVPR2019,文章地址:http://openaccess.thecvf.com/content_CVPR_2019/papers/Cheng_Learning_Image_and_Video_Compression_Through_Spatial-Temporal_Energy_Compaction_CVPR_2019_paper.pdf

一、摘要

本文最核心的思想就是在图片和视频压缩中实现了时间和空间的能量压缩。图片压缩中,将空间能量压缩增加到损失中;图片压缩中利用时间能量分布,提出了动态选择GOP大小的方案。本文的结果在压缩效果上并不算十分优秀,图片压缩比BPG略好,视频压缩只好于H264。

二、论文解析

        1、图片和视频编解码算法结构图

Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

     2、图片编解码

             (1)Analysis transform与Synthesis transform

               就是编、解码网络,网络结构如下:

               Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

                文章中提到,在解码端,先卷积后上采样比先上采样后卷积能取得更好的结果。

              Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

                 x是原图,y是特征表示,即压缩数据,Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解为重构图,Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解为压缩数据量化后结果,Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解为编解码网络的参数。y的维度为Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解,K为特征通道数,本文采用n=3,K=48。

           (2)量化与熵估计

             本文作者发现,常用的基于量化方式unifrom noise approximation和soft vector quantization对于图像压缩只有很小的影响,本文选择第一种量化方式。不同的量化方式示意图如下:

                 Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

           (3)空间能量压缩

             根据数字编码理论[,良好的能量压缩特性对于高编码效率性能至关重要。

                   Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

               其中,Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解表示方差。Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解表示编码数据中的通道能量分布;Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解表示量化误差对重构误差的影响程度。Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解的维度都是1×K。

               最小重建误差:

                    Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

               在损失中添加一个Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解的正则项:

                 Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

               其中,损失的前半部分就是率失真损失。

              关于Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解的正则项这样添加:

              首先,让能量尽量集中到几个通道中,通过除以和值对Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解进行归一。比如Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解,说明80%的能量集中在e通道中。能量分布的熵惩罚项如下:

               Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解,将其加入到损失中。

                经过一段时间的迭代,能量已经集中到一个或几个通道中了,然后需要将Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解最小化,损失添加的惩罚项如下:

             Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

        3、视频压缩

            将视频划分为GOP,表示为Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解,一个视频压缩系统可以被写为:

            Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

            Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解分别为编码和解码端的预测帧,Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解为残差,Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解为量化的残差,其余符号的含义可以参考图像压缩。对于GOP中的首帧,Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解就退化成了图片压缩。

            (1)帧间预测

             主要采用“Video Frame Interpolation via Adaptive Separable Convolution”中的方法,这篇文章我们正经复现过,没 有取得特别好的结果。采用如下公式帧间预测:

           Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

           i表示帧距。

          (2)时间能量压缩

            采用分层插值:

            Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

         每个视频含有不用的运动场,所以T的选取应该符合视频的运动特征。本文定义两个I帧之间以适当的距离Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解(本文实验中取16)定义运动差异:

           Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

           Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解分布的熵:

         Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

           大Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解表示快速运动,小Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解表示低速运动。T的选取规则:

          Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

          L和U为上下限常数,对于小运动视频,T=16,对于大运动视频,T=2,相当于取消了分层插值,带来的好处阻止了分层插值带来的误差扩散。

         空间能量直方图示例:

Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

4、实验

        1、实验细节:

        图片压缩训练集采用ImageNet,测试集采用Kodak。视频压缩的数据集采用VTL。

        优化器采用Adam,采用固定学习率Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解采用{2, 4, 8, 16, 32, 64}.L = 6.0, U = 8.0,公式(8)中的D = 1 − MS-SSIM(x, ˆx)。

        2、实验结果

          消融实验

           Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

                    图片压缩中去掉空间能量压缩后,效果变差。视频压缩中,动态选择的T的结果好于固定T的结果。

                   定量结果比对:

                   Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

          定性结果比对:

      Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

Learning Image and Video Compression through Spatial-Temporal Energy Compaction 个人理解

5、结论

本文提出的空间能量压缩方法可以有效提高图片压缩的性能。基于空间信息熵的自动插值周期选择的视频循环内插,也可以提高视频压缩的效果。图片压缩在MS-SSIM指标上好于BPG,视频压缩好于MPEG-4和H264。图片和视频压缩在视觉都能取得比传统方法更好的结果。

 

 

 

          

 

 

                  

            

              

 

              

               

 

相关文章:

  • 2021-12-14
  • 2022-12-23
  • 2021-10-08
  • 2021-06-26
  • 2021-07-03
  • 2022-12-23
  • 2022-12-23
  • 2021-08-20
猜你喜欢
  • 2021-06-03
  • 2021-04-03
  • 2021-11-15
  • 2021-07-10
  • 2021-04-25
  • 2021-07-30
  • 2021-07-22
相关资源
相似解决方案