Open In App

Multi-layer Perceptron a Supervised Neural Network Model using Sklearn

Last Updated : 12 Oct, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

An artificial neural network (ANN), often known as a neural network or simply a neural net, is a machine learning model that takes its cues from the structure and operation of the human brain. It is a key element in machine learning’s branch known as deep learning. Interconnected nodes, also referred to as artificial neurons or perceptrons, are arranged in layers to form neural networks. An input layer, one or more hidden layers, and an output layer are examples of these layers. A neural network’s individual neurons each execute a weighted sum of their inputs, apply an activation function to the sum, and then generate an output. The architecture of the network, including the number of layers and neurons in each layer, might vary significantly depending on the particular task at hand. Several machine learning tasks, such as classification, regression, image recognition, natural language processing, and others, can be performed using neural networks because of their great degree of versatility.

In order to reduce the discrepancy between expected and actual outputs, a neural network must be trained by changing the weights of its connections. Optimization techniques like gradient descent are used to do this. In particular, deep neural networks have made significant advances in fields like computer vision, speech recognition, and autonomous driving. Neural networks have demonstrated an exceptional ability to resolve complicated issues. They play a key role in modern AI and machine learning due to their capacity to automatically learn and extract features from data.

Supervised Neural Network models

A supervised neural network model is a type of machine learning model used for tasks where you have labelled data, meaning you know both the input and the corresponding correct output. In this model, you feed input data into layers of interconnected artificial neurons, which process the information and produce an output. During training, the model learns to adjust its internal parameters (weights and biases) to minimize the difference between its predictions and the actual labels in the training data. This process continues until the model can make accurate predictions on new, unseen data. Supervised neural networks are commonly used for tasks like image classification, speech recognition, and natural language processing, where the goal is to map inputs to specific categories or values.

Multi-Layer Perceptron Architecture

MLP (Multi-Layer Perceptron) is a type of neural network with an architecture consisting of input, hidden, and output layers of interconnected neurons. It is capable of learning complex patterns and performing tasks such as classification and regression by adjusting its parameters through training. Let’s explore the architecture of an MLP in detail:

  • Input Layer: The input layer is where the MLP and dataset first engage with one another. A feature in the incoming data is matched to each neuron in this layer. For instance, each neuron might represent the intensity value of a pixel in picture categorization. These unprocessed input values are to be distributed to the neurons in the next hidden layers by the input layer.
  • Hidden Layers: MLPs have a hidden layer or layers that are present between the input and output layers. The main computations happen at these layers. Every neuron in a hidden layer analyzes the data that comes from the neurons in the layer above it. In the same buried layer, neurons do not interact directly with one another but rather indirectly via weighted connections. The hidden layer transformation allows the network to learn intricate links and representations in the data. The intricacy of the task might affect the depth (number of hidden layers) and width (number of neurons in each layer).
  • Output Layer: The MLP’s neurons in the output layer, the last layer, generate the model’s predictions. The structure of this layer is determined by the particular task at hand. The probability score for binary classification may be generated by a single neuron with a sigmoid activation function. Multiple neurons, often with softmax activation, can give probabilities to each class in a multi-class classification system. When doing regression tasks, the output layer frequently just has a single neuron that can forecast a continuous value.

Each neuron applies an activation function to the weighted total of its inputs, whether it is in the input, hidden, or output layer. The sigmoid, hyperbolic tangent (tanh), and rectified linear unit (ReLU) are often used activation functions. The MLP modifies connection (synapse) weights during training using backpropagation and optimization methods like gradient descent. In order to reduce the discrepancy between projected and actual outputs, this method aids the network in learning and fine-tuning its parameters. MLPs are appropriate for a variety of machine learning and deep learning problems, from straightforward to extremely complicated, due to their flexibility in terms of the number of hidden layers, neurons per layer, and choice of activation functions.

MLP Classifier with its Parameters

The MLP Classifier, short for Multi-Layer Perceptron Classifier, is a neural network-based classification algorithm provided by the Scikit-Learn library. It’s a type of feedforward neural network, where information moves in only one direction: forward through the layers. Here’s a detailed explanation of the MLP Classifier and its parameters, which in return collectively define the architecture and behavior of the MLP Classifier :

  • Hidden Layer Sizes [Parameter: hidden_layer_sizes]: An MLP neural network’s hidden_layer_sizes parameter is a crucial structural element. It describes how the network’s hidden layers are structured. Each element of the tuple that this parameter accepts represents the number of neurons in a certain hidden layer. The network contains two hidden layers, with the first having 64 neurons and the second having 32 neurons, for instance, if hidden_layer_sizes is set to (64, 32). The network’s ability to recognize complex patterns and correlations in the data is greatly influenced by the choice of the number of neurons and hidden layers. In order to model complex data, deeper networks with more neurons may be more susceptible to overfitting.
  • Activation Function [Parameter: activation]: Each neuron in the MLP’s hidden layers is activated using a different activation function, which is determined by the activation parameter. The network can model intricate input-to-output mappings thanks to the non-linearity introduced by activation functions. There are numerous activation functions, each with their own special qualities. One well-liked option is “relu” (Rectified Linear Unit), which is both computationally effective and successful in reducing the vanishing gradient problem. Both “tanh” (Hyperbolic Tangent) and “logistic” (Logistic Sigmoid) are frequently used and have various uses.
  • Solver for Weight Optimization [Parameter: solver]: The neural network’s weights are updated during training using an optimization technique, which is determined by the solver parameter. To reduce the loss function of the network, several solvers use various methods. The “adam” algorithm, which combines ideas from RMSprop and Momentum, works well with large datasets and intricate models. The limited-memory Broyden-Fletcher-Goldfarb-Shanno optimization algorithm is used by “lbfgs,” which is best for smaller datasets. The stochastic gradient descent algorithm, known as “sgd,” adjusts weights based on random selections (mini-batches) of the training data at each iteration.
  • Learning Rate [Parameter: learning_rate]: Each training iteration’s weight updates are controlled by the learning_rate parameter. It is essential in establishing the training process’s stability and rate of convergence. The learning rate might be “constant,” “invscaling,” or “adaptive,” which can all have an impact on how it changes over time. It is crucial to choose the right learning rate since an extremely high rate might cause divergence or slow convergence, while a rate that is too low can cause very slow convergence.
  • Maximum Iterations [Parameter: max_iter]: The max_iter parameter restricts how many iterations can be made before the solver converges during training. Convergence is the condition at which further iterations of the network’s weights do not appreciably lower the loss. The solution quits and returns the current findings, which might not be ideal, if it cannot converge within the predetermined limit. When max_iter is chosen appropriately, the training process is neither abruptly stopped nor overly prolonged, allowing the model to converge to the desired level.

Implmentation using Iris Dataset

Let’s consider an example where we apply the above explained steps, with the famous Iris dataset or a custom dataset. Below is an example of building and training a neural network to classify iris flowers

Importing Libraries

Python3




# Importing required libraries
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score


The necessary libraries for using a neural network-based classifier are imported by this code. It contains libraries for performing mathematical operations, dividing data into smaller chunks, scaling features, building MLP (Multi-Layer Perceptron) classifiers, importing the Iris dataset, and assessing the model’s precision.

Loading Dataset

Python3




# Loading dataset
iris = load_iris() 
X, y = iris.data, iris.target


Using scikit-learn’s load_iris() function, this program loads the Iris dataset while allocating the feature data to X and the target labels to y. A well-liked dataset for classification problems in machine learning is the Iris dataset.

Splitting Data into Train and Test Sets

Python3




# Splitting data set into train & test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)


Using train_test_split() from scikit-learn, this code divides the loaded dataset (X and y) into training and testing sets. By fixing the random seed, the random_state option ensures repeatability while the test_size parameter determines the percentage of data to be allotted to the test set (20% in this case).

Feature Scaling

Python3




# Creating Object
scaler = StandardScaler() 
# Standardizing the features
X_train = scaler.fit_transform(X_train) 
X_test = scaler.transform(X_test)


To standardize the feature data, this code creates an object scaler called StandardScaler. To standardize the data, the fit_transform() method applies to the training set (X_train) and computes the mean and standard deviation of each feature. Then, using the transform() method, the identical transformation is applied to the test set (X_test), ensuring that both sets are uniformed based on the statistics from the training set. For many machine learning algorithms to work well, features must have sizes that are similar, hence this standardization procedure is crucial.

Model Development

Python3




# Creating (MLP) classifier
clf = MLPClassifier(hidden_layer_sizes=(64, 32), max_iter=1000,
                    random_state=42)


The MLPClassifier class from scikit-learn is used in this code to generate an instance of the Multi-Layer Perceptron (MLP) classifier. The neural network’s architecture is specified by the hidden_layer_sizes argument, which is set to a tuple (64, 32), which indicates that there are two hidden layers, each with 64 and 32 neurons. The solver’s maximum number of iterations is indicated by the max_iter parameter, which is set to 1000. For repeatability, random_state is set to 42.

Training the model and Prediction

Python3




# Training the model
clf.fit(X_train, y_train)
# Making prediction
y_pred = clf.predict(X_test) 


This code uses the fit method to train an MLP classifier (clf) utilizing standardized training data (X_train) and labels (y_train). Then, using the trained model, predictions are made on the test data (X_test), and the predicted labels are saved in the variable y_pred.

Evaluation of the model

Python3




# Determining Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")


Accuracy: 0.97

The scikit-learn accuracy_score function is used in this code to determine the precision of the MLP classifier’s predictions (y_pred) on the test data (y_test). The final accuracy value, which has two decimal places for reading, is then written to the console.

Conclusion

In conclusion, Scikit-Learn’s MLPClassifier was used to create the supervised neural network model, which is a potent tool for a variety of machine learning applications. This adaptable model provides flexibility in network architecture design and hyperparameter tuning, enabling it to accommodate varied dataset kinds and challenging challenges. Data loading and preprocessing are the first steps in the procedure, which also involve dividing the dataset into training and testing sets and standardizing characteristics to guarantee uniform scaling. Through parameters like hidden_layer_sizes, activation, solver, learning_rate, and max_iter, the MLPClassifier allows for customisation. The network’s capacity, training rate, and convergence behavior are affected by these parameters. Once the model has been trained, it can make predictions on fresh, unobserved data by fitting it to the training data. Its performance is evaluated using criteria like accuracy and F1-score. The supervised neural network model has successfully completed a number of classification tasks, and its versatility, along with meticulous parameter tuning, enables it to perform well in challenging, real-world situations.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads