Open In App

Activation functions in Neural Networks

It is recommended to understand Neural Networks before reading this article. 

In the process of building a neural network, one of the choices you get to make is what Activation Function to use in the hidden layer as well as at the output layer of the network. This article discusses some of the choices.



Elements of a Neural Network 

Input Layer: This layer accepts input features. It provides information from the outside world to the network, no computation is performed at this layer, nodes here just pass on the information(features) to the hidden layer. 

Hidden Layer: Nodes of this layer are not exposed to the outer world, they are part of the abstraction provided by any neural network. The hidden layer performs all sorts of computation on the features entered through the input layer and transfers the result to the output layer. 



Output Layer: This layer bring up the information learned by the network to the outer world. 

What is an activation function and why use them? 

The activation function decides whether a neuron should be activated or not by calculating the weighted sum and further adding bias to it. The purpose of the activation function is to introduce non-linearity into the output of a neuron. 

Explanation: We know, the neural network has neurons that work in correspondence with weight, bias, and their respective activation function. In a neural network, we would update the weights and biases of the neurons on the basis of the error at the output. This process is known as back-propagation. Activation functions make the back-propagation possible since the gradients are supplied along with the error to update the weights and biases. 

Why do we need Non-linear activation function?

A neural network without an activation function is essentially just a linear regression model. The activation function does the non-linear transformation to the input making it capable to learn and perform more complex tasks. 

Mathematical proof 

Suppose we have a Neural net like this :- 

Elements of the diagram are as follows: 

Hidden layer i.e. layer 1:

z(1) = W(1)X + b(1) a(1)

Here,

  • z(1) is the vectorized output of layer 1
  • W(1) be the vectorized weights assigned to neurons of hidden layer i.e. w1, w2, w3 and w4
  • X be the vectorized input features i.e. i1 and i2
  • b is the vectorized bias assigned to neurons in hidden layer i.e. b1 and b2
  • a(1) is the vectorized form of any linear function.

(Note: We are not considering activation function here)

 

Layer 2 i.e. output layer :-

Note : Input for layer 2 is output from layer 1

z(2) = W(2)a(1) + b(2)  

a(2) = z(2) 

Calculation at Output layer

z(2) = (W(2) * [W(1)X + b(1)]) + b(2)

z(2) = [W(2) * W(1)] * X + [W(2)*b(1) + b(2)]

Let, 

    [W(2) * W(1)] = W

    [W(2)*b(1) + b(2)] = b

Final output : z(2) = W*X + b

which is again a linear function

This observation results again in a linear function even after applying a hidden layer, hence we can conclude that, doesn’t matter how many hidden layer we attach in neural net, all layers will behave same way because the composition of two linear function is a linear function itself. Neuron can not learn with just a linear function attached to it. A non-linear activation function will let it learn as per the difference w.r.t error. Hence we need an activation function. 

Variants of Activation Function 

Linear Function 

For example : Calculation of price of a house is a regression problem. House price may have any big/small value, so we can apply linear activation at output layer. Even in this case neural net must have any non-linear function at hidden layers. 

Sigmoid Function 

 

Tanh Function 

 

 

RELU Function 

 

In simple words, RELU learns much faster than sigmoid and Tanh function.

Softmax Function

 

The softmax function is also a type of sigmoid function but is handy when we are trying to handle multi- class classification problems.


Article Tags :