如何在 Flux.jl 中进行渐变剪裁？答案

【问题标题】：How to do gradient clipping in Flux.jl?如何在 Flux.jl 中进行渐变剪裁？
【发布时间】：2021-09-15 03:24:40
【问题描述】：

我手上有一个classic example of the exploding gradient problem，我希望通过渐变剪裁来解决它。在 Flux 中这样做的接口是什么？

【问题讨论】：

【解决方案1】：

优化器（如渐变裁剪）可以有不同的使用方式，首先如下所示：

julia> using Flux

julia> W = rand(2, 5)
2×5 Matrix{Float64}:
 0.107144  0.643693  0.399019  0.764073  0.78122
 0.367751  0.335326  0.442312  0.433656  0.443901

julia> b = rand(2)
2-element Vector{Float64}:
 0.035723018492827885
 0.9063968296104223

julia> predict(x) = (W * x) .+ b
predict (generic function with 1 method)

julia> loss(x, y) = sum((predict(x) .- y).^2)
loss (generic function with 1 method)

julia> x, y = rand(5), rand(2) # Dummy data
([0.4878962006153771, 0.1293768496171035, 0.4662237969593086, 0.43195747100830384, 0.10672368947541733], [0.864923828559593, 0.6643701281693306])

julia> l = loss(x, y) # ~ 3
0.8292492365517469

julia> θ = params(W, b)
Params([[0.10714416442012298 0.6436932411339433 … 0.7640730577168127 0.7812198182421601; 0.3677513353707582 0.3353255969566744 … 0.4336560750116858 0.44390077304165043], [0.035723018492827885, 0.9063968296104223]])

julia> grads = gradient(() -> loss(x, y), θ)
Grads(...)

julia> using Flux.Optimise

julia> opt = Optimiser(ClipValue(1e-3), ADAM(1e-3))
Optimiser(Any[ClipValue{Float64}(0.001), ADAM(0.001, (0.9, 0.999), IdDict{Any, Any}())])

julia> for p in (W, b)
         update!(opt, p, grads[p])
       end
# This is a somewhat bad example since there is no exploding gradients here but the mechanics would be the same if there was.

或者您可以通过执行以下操作将优化器（opt = Optimiser(ClipValue(1e-3), ADAM(1e-3)) 来自此处：https://fluxml.ai/Flux.jl/stable/training/optimisers/#Gradient-Clipping）传递到训练循环中：

for d in datapoints

  # `d` should produce a collection of arguments
  # to the loss function

  # Calculate the gradients of the parameters
  # with respect to the loss function
  grads = Flux.gradient(parameters) do
    loss(d...)
  end

  # Update the parameters based on the chosen
  # optimiser (opt)
  Flux.Optimise.update!(opt, parameters, grads)
end
# Example from here: https://fluxml.ai/Flux.jl/stable/training/training/#Training

其中 opt 是根据上面显示的示例定义的。

【讨论】：