CNN基础(2)

CNN基础(2)

a mathematical model of neuron

Activation Functions

Sigmoid

σ(x)=1/(1+e−x)
- Sigmoids saturate and kill gradients
- Sigmoid outputs are not zero-centered
tanh

tanh(x)=2σ(2x)−1
- in practice the tanh non-linearity is always preferred to the sigmoid nonlinearity
ReLU(Rectified Linear Unit )

f(x)=max(0,x)
- (+) It was found to greatly accelerate
- (+) ReLU can be implemented by simply thresholding a matrix of activations at zero
- (-)Unfortunately, ReLU units can be fragile during training and can “die”
Leaky ReLU
$f (x) =$