# Introduction to Artificial Neutral Networks | Set 1

ANN learning is robust to errors in the training data and has been successfully applied for learning real-valued, discrete-valued, and vector-valued functions containing problems such as interpreting visual scenes, speech recognition, and learning robot control strategies. The study of artificial neural networks (ANNs) has been inspired in part by the observation that biological learning systems are built of very complex webs of interconnected neurons in brains. The human brain contains a densely interconnected network of approximately 10^11-10^12 neurons, each connected neuron, on average connected, to l0^4-10^5 other neurons. So on average human brain takes approximately 10^-1 to make surprisingly complex decisions. ANN systems are motivated to capture this kind of highly parallel computation based on distributed representations. Generally, ANNs are built out of a densely interconnected set of simple units, where each unit takes a number of real-valued inputs and produces a single real-valued output.

But ANNs are less motivated by biological neural systems, there are many complexities to biological neural systems that are not modeled by ANNs. Some of them are shown in the figures.

## Difference between Biological Neurons and Artificial Neurons

Biological Neurons | Artificial Neurons |
---|---|

Major components: Axions, Dendrites, Synapse | Major Components: Nodes, Inputs, Outputs, Weights, Bias |

Information from other neurons, in the form of electrical impulses, enters the dendrites at connection points called synapses. The information flows from the dendrites to the cell where it is processed. The output signal, a train of impulses, is then sent down the axon to the synapse of other neurons. | The arrangements and connections of the neurons made up the network and have three layers. The first layer is called the input layer and is the only layer exposed to external signals. The input layer transmits signals to the neurons in the next layer, which is called a hidden layer. The hidden layer extracts relevant features or patterns from the received signals. Those features or patterns that are considered important are then directed to the output layer, which is the final layer of the network. |

A synapse is able to increase or decrease the strength of the connection. This is where information is stored. | The artificial signals can be changed by weights in a manner similar to the physical changes that occur in the synapses. |

Approx 10^{11} neurons. | 10^{2}– 10^{4} neurons with current technology |

## Difference between the human brain and computers in terms of how information is processed.

Human Brain(Biological Neuron Network) | Computers(Artificial Neuron Network) |
---|---|

The human brain works asynchronously | Computers(ANN) work synchronously. |

Biological Neurons compute slowly (several ms per computation) | Artificial Neurons compute fast (<1 nanosecond per computation) |

The brain represents information in a distributed way because neurons are unreliable and could die any time. | In computer programs every bit has to function as intended otherwise these programs would crash. |

Our brain changes their connectivity over time to represents new information and requirements imposed on us. | The connectivity between the electronic components in a computer never change unless we replace its components. |

Biological neural networks have complicated topologies. | ANNs are often in a tree structure. |

Researchers are still to find out how the brain actually learns. | ANNs use Gradient Descent for learning. |

**Advantage of Using Artificial Neural Networks:**

- Problem in ANNs can have instances that are represented by many attribute-value pairs.
- ANNs used for problems having the target function output may be discrete-valued, real-valued, or a vector of several real- or discrete-valued attributes.
- ANN learning methods are quite robust to noise in the training data. The training examples may contain errors, which do not affect the final output.
- It is used generally used where the fast evaluation of the learned target function may be required.
- ANNs can bear long training times depending on factors such as the number of weights in the network, the number of training examples considered, and the settings of various learning algorithm parameters.

**The McCulloch-Pitts Model of Neuron:**

The early model of an artificial neuron is introduced by **Warren McCulloch** and Walter Pitts in 1943. The McCulloch-Pitts neural model is also known as linear threshold gate. It is a neuron of a set of inputs I1, I2,…, Im and one output y. The linear threshold gate simply classifies the set of inputs into two different classes. Thus the output y is binary. Such a function can be described mathematically using these equations:

W1,W2,W3….Wn are weight values normalized in the range of either (0,1)or (-1,1) and associated with each input line, Sum is the weighted sum, and is a threshold constant. The function f is a linear step function at the threshold

**Single-layer Neural Networks (Perceptrons)**

Input is multi-dimensional (i.e. input can be a vector):

input x = ( I1, I2, .., In)

Input nodes (or units) are connected (typically fully) to a node (or multiple nodes) in the next layer. A node in the next layer takes a weighted sum of all its inputs:

**The rule:**

The output node has a “threshold” t.

Rule: If summed input ? t, then it “fires” (output y = 1). Else (summed input < t) it doesn't fire (output y = 0).

which

**Limitations of Perceptrons:**

(i) The output values of a perceptron can take on only one of two values (0 or 1) due to the hard-limit transfer function.

(ii) Perceptrons can only classify linearly separable sets of vectors. If a straight line or a plane can be drawn to separate the input vectors into their correct categories, the input vectors are linearly separable. If the vectors are not linearly separable, learning will never reach a point where all vectors are classified properly

The Boolean function XOR is not linearly separable (Its positive and negative instances cannot be separated by a line or hyperplane). Hence a single layer perceptron can never compute the XOR function. This is a big drawback that once resulted in the stagnation of the field of neural networks. But this has been solved by multi-layer.

**Multi-layer Neural Networks**

A Multi-Layer Perceptron (MLP) or Multi-Layer Neural Network contains one or more hidden layers (apart from one input and one output layer). While a single layer perceptron can only learn linear functions, a multi-layer perceptron can also learn non – linear functions.

This neuron takes as input x1,x2,….,x3 (and a +1 bias term), and outputs f(summed inputs+bias), where f(.) called the activation function. The main function of Bias is to provide every node with a trainable constant value (in addition to the normal inputs that the node receives). Every activation function (or non-linearity) takes a single number and performs a certain fixed mathematical operation on it. There are several activation functions you may encounter in practice:

**Sigmoid:**takes real-valued input and squashes it to range between 0 and 1.**tanh:**takes real-valued input and squashes it to the range [-1, 1 ].**ReLu:**ReLu stands for Rectified Linear Units. It takes real-valued input and thresholds it to 0 (replaces negative values to 0 ).

**References:**

- NEURAL NETWORKS by Christos Stergiou and Dimitrios Siganos
- ujjwalkarn.me
- Machine Learning, Tom Mitchell, McGraw Hill, 1997.