Open In App

How Neural Networks Solve the XOR Problem

Last Updated : 30 Aug, 2024
Summarize
Comments
Improve
Suggest changes
Like Article
Like
Save
Share
Report
News Follow

The XOR (exclusive OR) is a simple logic gate problem that cannot be solved using a single-layer perceptron (a basic neural network model). We can solve this using neural networks. Neural networks are powerful tools in machine learning.

In this article, we are going to discuss what is XOR problem, how we can solve it using neural networks, and also a simple code to demonstrate this.

What is the XOR Problem?

The XOR operation is a binary operation that takes two binary inputs and produces a binary output. The output of the operation is 1 only when the inputs are different.

Below is the truth table for XOR:

Input A

Input B

XOR Output

0

0

0

0

1

1

1

0

1

1

1

0

The main problem is that a single-layer perceptron cannot solve this problem because the data is not linearly separable i.e. we cannot draw a straight line to separate the output classes (0s and 1s)

Why Single-Layer Perceptrons Fail?

A single-layer perceptron can solve problems that are linearly separable by learning a linear decision boundary.

Mathematically, the decision boundary is represented by:

y = \text{step}(\mathbf{w} \cdot \mathbf{x} + b)

Where:

  • \mathbf{w} is the weight vector.
  • \mathbf{x} is the input vector.
  • b is the bias term.
  • \text{step} is the activation function, often a Heaviside step function that outputs 1 if the input is positive and 0 otherwise.

For linearly separable data, the perceptron can adjust the weights \mathbf{w} and bias b during training to correctly classify the data. However, because XOR is not linearly separable, no single line (or hyperplane) can separate the outputs 0 and 1, making a single-layer perceptron inadequate for solving the XOR problem.

How Multi-Layer Neural Networks Solve XOR?

A multi-layer neural network which is also known as a feedforward neural network or multi-layer perceptron is able to solve the XOR problem. There are multiple layer of neurons such as input layer, hidden layer, and output layer.

The working of each layer:

  1. Input Layer: This layer takes the two inputs (A and B).
  2. Hidden Layer: This layer applies non-linear activation functions to create new, transformed features that help separate the classes.
  3. Output Layer: This layer produces the final XOR result.

Mathematics Behind the MLP Solution

Let's break down the mathematics behind how an MLP can solve the XOR problem.

Step 1: Input to Hidden Layer Transformation

Consider an MLP with two neurons in the hidden layer, each applying a non-linear activation function (like the sigmoid function). The output of the hidden neurons can be represented as:

h_1 = \sigma(w_{11} A + w_{12} B + b_1)

h_2 = \sigma(w_{21} A + w_{22} B + b_2)

Where:

  • \sigma(x) = \frac{1}{1 + e^{-x}} is the sigmoid activation function.
  • w_{ij}​ are the weights from the input neurons to the hidden neurons.
  • b_i are the biases for the hidden neurons.

Activation functions such as the sigmoid or ReLU (Rectified Linear Unit) introduce non-linearity into the model. It enables the neural network to handle complex patterns like XOR. Without these functions, the network would behave like a simple linear model, which is insufficient for solving XOR.

Step 2: Hidden Layer to Output Layer Transformation

The output neuron combines the outputs of the hidden neurons to produce the final output:

\text{Output} = \sigma(w_{31} h_1 + w_{32} h_2 + b_3)

Where w_{3i}​ are the weights from the hidden neurons to the output neuron, and b_3​ is the bias for the output neuron.

Step 3: Learning Weights and Biases

During the training process, the network adjusts the weights w_{ij} and biases b_i​ using backpropagation and gradient descent to minimize the error between the predicted output and the actual XOR output.

Example Configuration:

Let's consider a specific configuration of weights and biases that solves the XOR problem:

  • For the hidden layer:
    • w_{11} = 1 , w_{12} =1, b_1 =0.5
    • w_{21} =1 , w_{22} =1 , b_2 = -1.5
  • For the output layer:
    • w_{31} =1, w_{32} =1, b_3 = -1

With these weights and biases, the network produces the correct XOR output for each input pair (A, B).

Geometric Interpretation

In the hidden layer, the network effectively transforms the input space into a new space where the XOR problem becomes linearly separable. This can be visualized as bending or twisting the input space such that the points corresponding to different XOR outputs (0s and 1s) are now separable by a linear decision boundary.

Training the Neural Network to Solve XOR Problem

The neural network learns to solve the XOR problem by adjusting the weights during training. This is done using backpropagation, where the network calculates the error in its output and adjusts its internal weights to minimize this error over time. This process continues until the network can correctly predict the XOR output for all given input combinations.

The following python code implementation demonstrates how neural networks solve the XOR problem using TensorFlow and Keras:

Python
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Define the XOR input and output data
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

# Build the neural network model
model = Sequential()
model.add(Dense(2, input_dim=2, activation='relu'))  # Hidden layer with 2 neurons
model.add(Dense(1, activation='sigmoid'))            # Output layer with 1 neuron

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X, y, epochs=10000, verbose=0)

# Evaluate the model
_, accuracy = model.evaluate(X, y)
print(f"Accuracy: {accuracy * 100:.2f}%")

# Make predictions
predictions = model.predict(X)
predictions = np.round(predictions).astype(int)

print("Predictions:")
for i in range(len(X)):
    print(f"Input: {X[i]} => Predicted Output: {predictions[i]}, Actual Output: {y[i]}")

Output:

1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 168ms/step - accuracy: 0.5000 - loss: 0.6931
Accuracy: 50.00%
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step
Predictions:
Input: [0 0] => Predicted Output: [0], Actual Output: [0]
Input: [0 1] => Predicted Output: [0], Actual Output: [1]
Input: [1 0] => Predicted Output: [0], Actual Output: [1]
Input: [1 1] => Predicted Output: [0], Actual Output: [0]

Conclusion

The XOR problem is a classic example that highlights the limitations of simple neural networks and the need for multi-layer architectures. By introducing a hidden layer and non-linear activation functions, an MLP can solve the XOR problem by learning complex decision boundaries that a single-layer perceptron cannot. Understanding this solution provides valuable insight into the power of deep learning models and their ability to tackle non-linear problems in various domains.


Next Article

Similar Reads

three90RightbarBannerImg