Neural Network architectures
a mathematical model of neuron
Activation Functions
-
Sigmoid
σ(x)=1/(1+e−x) - Sigmoids saturate and kill gradients
- Sigmoid outputs are not zero-centered
-
tanh
tanh(x)=2σ(2x)−1 - in practice the tanh non-linearity is always preferred to the sigmoid nonlinearity
-
ReLU(Rectified Linear Unit )
f(x)=max(0,x) - (+) It was found to greatly accelerate
- (+) ReLU can be implemented by simply thresholding a matrix of activations at zero
- (-)Unfortunately, ReLU units can be fragile during training and can “die”
-
Leaky ReLU
f(x)=