2.7 Adam

这一篇写得特别详细:
深度学习优化算法解析(Momentum, RMSProp, Adam)
Adam(Adaptive Moment Estimation)
初始化:

vdW=0,vdb=0,SdW=0,Sdb=0

On iteration t:
compute dW,db using mini batch

Momentum:vdW=β1vdW+(1β1)dWvdb=β1vdb+(1β1)dbRMSprop:SdW=β2SdW+(1β2)dW2Sdb=β2Sdb+(1β2)db2Biascorrection:vdWcorrected=vdW1β12vdbcorrected=vdb1β12SdWcorrected=SdW1β22Sdbcorrected=Sdb1β22Computation:W=WαvdWSdWb=bαvdbSdb

2.7 Adam
When implementing Adam, what people usually do is just use the default value. So, β1 and β2 as well as ϵ. I don’t think anyone ever really tunes Epsilon. And then, try a range of values of Alpha to see what works best.

So, where does the term ‘Adam’ come from?

Adam stands for Adaptive Moment Estimation. So β1 is computing the mean of the derivatives. This is called the first moment. And β2 is used to compute exponentially weighted average of the squares and that’s called the second moment. So that gives rise to the name adaptive moment estimation.

相关文章:

  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
  • 2021-08-25
  • 2021-12-04
  • 2022-01-04
  • 2021-11-11
  • 2021-09-01
猜你喜欢
  • 2021-10-26
  • 2022-01-08
  • 2021-05-24
  • 2021-09-25
  • 2021-05-25
  • 2022-02-05
  • 2021-11-17
相关资源
相似解决方案