1. McCulloch-Pitts Model of Neuron
The McCulloch-Pitts neural model, which was the earliest ANN model, has only two types of inputs — Excitatory and Inhibitory. The excitatory inputs have weights of positive magnitude and the inhibitory weights have weights of negative magnitude. The inputs of the McCulloch-Pitts neuron could be either 0 or 1. It has a threshold function as an activation function. So, the output signal y_{out} is 1 if the input y_{sum} is greater than or equal to a given threshold value, else 0. The diagrammatic representation of the model is as follows:
Simple McCulloch-Pitts neurons can be used to design logical operations. For that purpose, the connection weights need to be correctly decided along with the threshold function (rather than the threshold value of the activation function). For better understanding purpose, let me consider an example:
John carries an umbrella if it is sunny or if it is raining. There are four given situations. I need to decide when John will carry the umbrella. The situations are as follows:
- First scenario: It is not raining, nor it is sunny
- Second scenario: It is not raining, but it is sunny
- Third scenario: It is raining, and it is not sunny
- Fourth scenario: It is raining as well as it is sunny
To analyse the situations using the McCulloch-Pitts neural model, I can consider the input signals as follows:
- X_{1}: Is it raining?
- X_{2} : Is it sunny?
So, the value of both scenarios can be either 0 or 1. We can use the value of both weights X_{1} and X_{2 }as 1 and a threshold function as 1. So, the neural network model will look like:
Truth Table for this case will be:
Situation | x_{1} | x_{2} | y_{sum} | y_{out} |
1 | 0 | 0 | 0 | 0 |
2 | 0 | 1 | 1 | 1 |
3 | 1 | 0 | 1 | 1 |
4 | 1 | 1 | 2 | 1 |
So, I can say that,
The truth table built with respect to the problem is depicted above. From the truth table, I can conclude that in the situations where the value of y_{out} is 1, John needs to carry an umbrella. Hence, he will need to carry an umbrella in scenarios 2, 3 and 4.
2. Rosenblatt’s Perceptron
Rosenblatt’s perceptron is built around the McCulloch-Pitts neural model. The diagrammatic representation is as follows:
The perceptron receives a set of input x_{1}, x_{2},….., x_{n}. The linear combiner or the adder mode computes the linear combination of the inputs applied to the synapses with synaptic weights being w_{1}, w_{2},……,w_{n}. Then, the hard limiter checks whether the resulting sum is positive or negative If the input of the hard limiter node is positive, the output is +1, and if the input is negative, the output is -1. Mathematically the hard limiter input is:
However, perceptron includes an adjustable value or bias as an additional weight w_{0}. This additional weight is attached to a dummy input x_{0}, which is assigned a value of 1. This consideration modifies the above equation to:
The output is decided by the expression:
The objective of the perceptron is o classify a set of inputs into two classes c_{1} and c_{2}. This can be done using a very simple decision rule – assign the inputs to c_{1} if the output of the perceptron i.e. y_{out} is +1 and c_{2} if y_{out} is -1. So for an n-dimensional signal space i.e. a space for ‘n’ input signals, the simplest form of perceptron will have two decision regions, resembling two classes, separated by a hyperplane defined by:
Therefore, the two input signals denoted by the variables x_{1} and x_{2}, the decision boundary is a straight line of the form:
or
So, for a perceptron having the values of synaptic weights w_{0},w_{1} and w_{2} as -2, 1/2 and 1/4, respectively. The linear decision boundary will be of the form:
So, any point (x,_{1}x_{2}) which lies above the decision boundary, as depicted by the graph, will be assigned to class c1 and the points which lie below the boundary are assigned to class c2.
Thus, we see that for a data set with linearly separable classes, perceptrons can always be employed to solve classification problems using decision lines (for 2-dimensional space), decision planes (for 3-dimensional space) or decision hyperplanes (for n-dimensional space). Appropriate values of the synaptic weights can be obtained by training a perceptron. However, one assumption for perceptron to work properly is that the two classes should be linearly separable i.e. the classes should be sufficiently separated from each other. Otherwise, if the classes are non-linearly separable, then the classification problem cannot be solved by perceptron.
Multi-layer perceptron: A basic perceptron works very successfully for data sets which possess linearly separable patterns. However, in practical situations, that is an ideal situation to have. This was exactly the point driven by Minsky and Papert in their work in 1969. They showed that a basic perceptron is not able to learn to compute even a simple 2 bit XOR. So, let us understand the reason.
Consider a truth table highlighting output of a 2 bit XOR function:
x_{1} | x_{2} | x_{1} XOR x_{2} | Class |
1 | 1 | 0 | c_{2} |
1 | 0 | 1 | c_{1} |
0 | 1 | 1 | c_{1} |
0 | 0 | 0 | c_{2} |
The data is not linearly separable. Only a curved decision boundary can separate the classes properly. To address this issue, the other option is to use two decision boundary lines in place of one.
This is the philosophy used to design the multi-layer perceptron model. The major highlights of this model are as follows:
- The neural network contains one or more intermediate layers between the input and output nodes, which are hidden from both input and output nodes
- Each neuron in the network includes a non-linear activation function that is differentiable.
- The neurons in each layer are connected with some or all the neurons in the previous layer.
3. ADALINE Network Model
Adaptive Linear Neural Element (ADALINE) is an early single-layer ANN developed by Professor Bernard Widrow of Stanford University. As depicted in the below diagram, it has only output neurons. The output value can be +1 or -1. A bias input x_{0} (where x_{0} =1) having a weight w_{0} is added. The activation function is such that if weighted sum is positive or 0, the output is 1, else it is -1. Formally I can say that,
The supervised learning algorithm adopted by ADALINE network is known as Least Mean Square (LMS) or DELTA Rule. A network combining a number of ADALINE is termed as MADALINE (many ADALINE). MEADALINE networks can be used to solve problems related to non-linear separability.