# Understanding Multi-Layer Feed Forward Networks

Let’s understand how errors are calculated and weights are updated in backpropagation networks(BPNs).

Consider the following network in the below figure.

The network in the above figure is a simple multi-layer feed-forward network or backpropagation network. It contains three layers, the input layer with two neurons x_{1} and x_{2}, the hidden layer with two neurons z_{1} and z_{2} and the output layer with one neuron y_{in. }

Now let’s write down the weights and bias vectors for each neuron.

Note: The weights are taken randomly.

**Input layer: **i/p – [x_{1} x_{2}] = [0 1]

Here since it is the input layer only the input values are present.

**Hidden layer:** z_{1} – [v_{11} v_{21} v_{01}] = [0.6 -0.1 03]

Here v_{11} refers to the weight of first input x_{1} on z_{1}, v_{21} refers to the weight of second input x_{2} on z_{1} and v_{01} refers to the bias value on z_{1}.

z_{2} – [v_{12} v_{22} v_{02}] = [-0.3 0.4 0.5]

Here v_{12} refers to the weight of first input x_{1} on z_{2}, v_{22} refers to the weight of second input x_{2} on z_{2} and v_{02} refers to the bias value on z_{2}.

**Output layer: **y_{in} – [w_{11} w_{21} w_{01}] = [0.4 0.1 -0.2]

Here w_{11} refers to the weight of first neuron z_{1} in a hidden layer on y_{in}, w_{21} refers to the weight of second neuron z_{2} in a hidden layer on y_{in} and w_{01} refers to the bias value on y_{in}. Let’s consider three variables, k which refers to the neurons in the output layer, ‘j’ which refers to the neurons in the hidden layer and ‘i’ which refers to the neurons in the input layer.

Therefore,

k = 1

j = 1, 2(meaning first neuron and second neuron in hidden layer)

i = 1, 2(meaning first and second neuron in the input layer)

Below are some conditions to be followed in BPNs.

#### Conditions/Constraints:

- In BPN, the activation function used should be differentiable.
- The input for bias is always 1.

To proceed with the problem, let:

Target value, t = 1

Learning rate, α = 0.25

Activation function = Binary sigmoid function

Binary sigmoid function, f(x) = (1+e^{-x})^{-1 }eq. (1)

And, f'(x) = f(x)[1-f(x)] eq. (2)

There are three steps to solve the problem:

- Computing the output, y.
- Backpropagation of errors, i.e., between output and hidden layer, hidden and input layer.
- Updating weights.

### Step 1:

The value y is calculated by finding y_{in} and applying the activation function.

y_{in} is calculated as:

y_{in} = w_{01} + z_{1}*w_{11} + z_{2}*w_{21} eq. (3)

Here, z_{1} and z_{2} are the values from hidden layer, calculated by finding z_{in1}, z_{in2} and applying activation function to them.

z_{in1} and z_{in2} are calculated as:

z_{in1} = v_{01} + x_{1}*v_{11} + x_{2}*v_{21} eq. (4)

z_{in2} = v_{02} + x_{1}*v_{12} + x_{2}*v_{22} eq. (5)

From (4)

z_{in1} = 0.3 + 0*0.6 + 1*(-0.1)

z_{in1} = 0.2

z_{1} = f(z_{in1}) = (1+e^{-0.2})^{-1} From (1)

**z _{1} = 0.5498**

From (5)

z_{in2} = 0.5 + 0*(-0.3) + 1*0.4

z_{in2} = 0.9

z_{2} = f(z_{in2}) = (1+e^{-0.9})^{-1} From (1)

**z _{2} = 0.7109**

From (3)

y_{in} = (-0.2) + 0.5498*0.4 + 0.7109*0.1

y_{in} = 0.0910

y = f(y_{in}) = (1+e^{-0.0910})^{-1} From (1)

**y = 0.5227**

Here, y is not equal to the target ‘t’, which is 1. And we proceed to calculate the errors and then update weights from them in order to achieve the target value.

### Step 2:

#### (a) Calculating the error between output and hidden layer

Error between output and hidden layer is represented as δ_{k}, where k represents the neurons in output layer as mentioned above. The error is calculated as:

δ_{k} = (t_{k} – y_{k}) * f'(y_{ink}) eq. (6)

where, f'(y_{ink}) = f(y_{ink})[1 – f(y_{ink})] From (2)

Since k = 1 (Assumed above),

δ = (t – y) f'(y_{in}) eq. (7)

where, f'(y_{in}) = f(y_{in})[1 – f(y_{in})]

f'(y_{in}) = 0.5227[1 – 0.5227]

f'(y_{in}) = 0.2495

Therefore,

δ = (1 – 0.5227) * 0.2495 From (7)

**δ = 0.1191,** is the error

#### Note: (Target – Output) i.e., (t – y) is the error in the output not in the layer. Error in a layer is contributed by different factors like weights and bias.

#### (b) Calculating the error between hidden and input layer

Error between hidden and input layer is represented as δ_{j}, where j represents the number of neurons in the hidden layer as mentioned above. The error is calculated as:

δ_{j} = δ_{inj} * f'(z_{inj}) eq. (8)

where,

δ_{inj} = ∑_{k=1 to n} (δ_{k} * w_{jk}) eq. (9)

f'(z_{inj}) = f(z_{inj})[1 – f(z_{inj})] eq. (10)

Since k = 1(Assumed above) eq. (9) becomes:

δ_{inj} = δ * w_{j1} eq. (11)

As j = 1, 2, we will have one error values for each neuron and total of 2 errors values.

δ_{1} = δ_{in1} * f'(z_{in1}) eq. (12), From (8)

δ_{in1} = δ * w_{11} From (11)

δ_{in1} = 0.1191 * 0.4 From weights vectors

δ_{in1} = 0.04764

f'(z_{in1}) = f(z_{in1})[1 – f(z_{in1})]

f'(z_{in1}) = 0.5498[1 – 0.5498] As f(z_{in1}) = z_{1}

f'(z_{in1}) = 0.2475

Substituting in (12)

**δ _{1} = 0.04674 * 0.2475 = 0.0118**

δ_{2} = δ_{in2} * f'(z_{in2}) eq. (13), From (8)

δ_{in2} = δ * w_{21} From (11)

δ_{in2} = 0.1191 * 0.1 From weights vectors

δ_{in2} = 0.0119

f'(z_{in2}) = f(z_{in2})[1 – f(z_{in2})]

f'(z_{in2}) = 0.7109[1 – 0.7109] As f(z_{in2}) = z_{2}

f'(z_{in2}) = 0.2055

Substituting in (13)

**δ _{2} = 0.0119 * 0.2055 = 0.00245**

The errors have been calculated, the weights have to be updated using these error values.

### Step 3:

The formula for updating weights for output layer is:

w_{jk}(new) = w_{jk}(old) + Δw_{jk} eq. (14)

where, Δw_{jk} = α * δ_{k} * z_{j} eq. (15)

Since k = 1, (15) becomes:

Δw_{jk} = α * δ * z_{i} eq. (16)_{ }

The formula for updating weights for hidden layer is:

v_{ij}(new) = v_{ij}(old) + Δv_{ij} eq. (17)

where, Δv_{i} = α * δ_{j} * x_{i} eq. (18)

From (14) and (16)

w_{11}(new) = w_{11}(old) + Δw_{11} = 0.4 + α * δ * z_{1} = 0.4 + 0.25 * 0.1191 * 0.5498 = 0.4164

w_{21}(new) = w_{21}(old) + Δw_{21} = 0.1 + α * δ * z_{2} = 0.1 + 0.25 * 0.1191 * 0.7109 = 0.12117

w_{01}(new) = w_{01}(old) + Δw_{01} = (-0.2) + α * δ * bias = (-0.2) + 0.25 * 0.1191 * 1 = -0.1709, kindly note the 1 taken here is input considered for bias as per the conditions.

These are the updated weights of the output layer.

From (17) and (18)

v_{11}(new) = v_{11}(old) + Δv_{11} = 0.6 + α * δ_{1} * x_{1} = 0.6 + 0.25 * 0.0118 * 0 = 0.6

v_{21}(new) = v_{21}(old) + Δv_{21} = (-0.1) + α * δ_{1} * x_{2} = (-0.1) + 0.25 * 0.0118 * 1 = 0.00295

v_{01}(new) = v_{01}(old) + Δv_{01} = 0.3 + α * δ_{1} * bias = 0.3 + 0.25 * 0.0118 * 1 = 0.00295, kindly note the 1 taken here is input considered for bias as per the conditions.

v_{12}(new) = v_{12}(old) + Δv_{12} = (-0.3) + α * δ_{2} * x_{1} = (-0.3) + 0.25 * 0.00245 * 0 = -0.3

v_{22}(new) = v_{22}(old) + Δv_{22} = 0.4 + α * δ_{2} * x_{2} = 0.4 + 0.25 * 0.00245 * 1 = 0.400612

v_{02}(new) = v_{02}(old) + Δv_{02} = 0.5 + α * δ_{2} * bias = 0.5 + 0.25 * 0.00245 * 1 = 0.500612, kindly note the 1 taken here is input considered for bias as per the conditions.

These are all the updated weights of the hidden layer.

These three steps are repeated until the output ‘y’ is equal to the target ‘t’.

This is how the BPNs work. The backpropagation in BPN refers to that the error in the present layer is used to update weights between the present and previous layer by backpropagating the error values.

Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the **Machine Learning Foundation Course** at a student-friendly price and become industry ready.