The article Activation-functions-neural-networks will help to understand the use of activation function along with the explanation of some of its variants like linear, sigmoid, tanh, Relu and softmax. There are some other variants of the activation function like Elu, Selu, Leaky Relu, Softsign and Softplus which are discussed briefly in this article.
Leaky Relu function:
Leaky Rectified linear unit(Leaky Relu) is an extension of the Relu function to overcome the dying neuron problem.
lerelu(x) = x if x>0 lerelu(x) = 0.01 * x if x<=0
d/dx lerelu(x) = 1 if x>0 d/dx lerelu(x) = 0.01 if x<=0
Uses: Relu return 0 if the input is negative and hence the neuron becomes inactive as it does not contribute to gradient flow. Leaky Relu overcomes this problem by allowing small value to flow when the input is negative. So, if the learning is too slow using Relu, one can try using Leaky Relu to see any improvement happens or not.
The exponential Linear Unit is also similar to Leaky Relu but differs for negative input. It also helps to overcome the dying neuron problem.
elu(x) = x if x>0 elu(x) = alpha * (exp(x)-1) if x<0
d/dx elu(x) = 1 if x>0 d/dx elu(x) = elu(x) + alpha if x<=0
Uses: It has the same purpose that of Leaky Relu and convergence of cost function towards zero is faster than Relu as well as Leaky Relu. For example, neural network learning on Imagenet using Elu is faster than using Relu.
Scaled Exponential Linear Unit is the scaled form of Elu. Just multiply the output of Elu by a predetermined “scale” parameter and you will get the desired output which selu gives.
selu(x) = scale * x if x>0 selu(x) = scale * alpha * (exp(x)-1) if x<=0 where, alpha = 1.67326324 scale = 1.05070098
d/dx selu(x) = scale if x>0 d/dx selu(x) = selu(x) + scale * alpha if x<=0
Uses: This activation function is used in Self-Normalizing Neural Networks (SNNs) which is used to train a deep and robust network less effected from vanishing and exploding gradient problem.
Softsign function is an alternative to tanh function where tanh converges exponentially and softsign converges polynomially.
softsign(x) = x / (1 + |x|)
d/dx softsign(x) = 1 / (1 + |x|)^2
Uses: It is mostly used in the regression problem and can be used in a deep neural network for text to speech conversion.
Softplus function is a smoothed form of the Relu activation function and its derivative is the sigmoid function. It also helps in overcoming the dying neuron problem. Equation:
softplus(x) = log(1 + exp(x))
d/dx softplus(x) = 1 / (1 + exp(-x))
Uses: Some experiments show that softplus takes lesser epochs to converge than Relu and sigmoid. It can be used in the speech recognition system.
- Activation functions in Neural Networks
- Understanding Activation Functions in Depth
- Neural Networks | A beginners guide
- Recurrent Neural Networks Explanation
- ML | Transfer Learning with Convolutional Neural Networks
- Capsule Neural Networks | ML
- Dropout in Neural Networks
- DeepPose: Human Pose Estimation via Deep Neural Networks
- Multiple Labels Using Convolutional Neural Networks
- Single Layered Neural Networks in R Programming
- Introduction to Artificial Neural Network | Set 2
- Applying Convolutional Neural Network on mnist dataset
- Introduction to Recurrent Neural Network
- Importance of Convolutional Neural Network | ML
- Deep Neural net with forward and back propagation from scratch - Python
- Neural Logic Reinforcement Learning - An Introduction
- Neural Network Advances
- ML - Neural Network Implementation in C++ From Scratch
- Choose optimal number of epochs to train a neural network in Keras
- Implementation of Artificial Neural Network for AND Logic Gate with 2-bit Binary Input
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.