ML – Neural Network Implementation in C++ From Scratch
What are we going to do here?
This article explains how to create a super-fast Artificial Neural Network that can crunch millions of data points within seconds! even milliseconds. Artificial Intelligence and Machine Learning are nowadays one of the most trending topics among computer geeks. Data scientists are being hired by tech giants for their excellence in these fields.
Why use C++
Now, if you have already implemented a neural network model in some other programming language then you might have noticed ( If you have a low-end PC ) that your models work pretty slow on even small datasets. When you began learning about Neural Networks you might have googled Which language is best for machine learning? and the obvious answer you get is Python or R is best for machine learning, other languages are hard so you must not waste your time on them!. Now, if the user starts programming, they face the problem of time and resource consumption. So, this article shows how to a super fast neural network.
- Basic knowledge about what are classes and how they work.
- Use a linear algebra library called Eigen
- Some basic read/write operations in C++
- Some basic knowledge about linear algebra as we are using a library for that
Eigen by its core is a library for super fast linear algebra operations and it’s the fastest and easiest one out there. Some resources to learn the basics of Eigen.
While learning Eigen you will encounter one of the most powerful feature of C++ – Template Metaprogramming. It is recommended not to get deviated from the track right now ( if you are new to C++ ) and assume those as basic parameters to a function! However if you are really obsessed with learning new and powerful things then here’s a good article and a video for it.
Writing the Neural Network class
Before going further I assume that you know what a Neural Network is and how does it learn. If not, then I do recommend you the following pages to take a look at!
Code : The Neural Network Class
Next, we move ahead by implementing each function one by one… But first, create two files (NeuralNetwork.cpp and NeuralNetwork.hpp) and write the above NeuralNetwork class code yourself in the “NeuralNetwork.hpp”. The following line of code must be copied in the “NeuralNetwork.cpp” file.
Code: Constructor for the Neural Network Class
Explanation of constructor function – Initializing the neurons, cache and deltas
The topology vector describes how many neurons we have in each layer, and the size of this vector is equal to a number of layers in the neural network. Each layer in the neural network is an array of neurons, we store each of these layers as a vector such that each element in this vector stores the activation value of neuron in that layer (note that an array of these layers is the neural network itself. Now in line 8, we add an extra bias neuron to each layer except in the output layer (line 7). The cache and delta vector is of the same dimensions as that of the neuronLayer vector. We are using vectors here as layers and not a 2D matrix as we are doing SGD and not batch or mini-batch gradient descent. Now, a cache is just another name of the sum of weighted inputs from the previous layer.
A notation that we will use for dimensions of a matrix is: [m n] denotes a matrix having m rows and n columns.
Initializing Weights matrix
Initializing weights matrix is a bit tricky! (mathematically). Pay very serious attention to whatever you read for the next few lines as this will explain how we want to use the weights matrix through this article. I assume that you know how layers are interconnected in a neural network.
- CURRENT_LAYER represents the layer which is taking input and PREV_LAYER and FWD_LAYER represents a layer back and a layer front of the CURRENT_LAYER.
- c-th column in the weights matrix represents the connection of c-th neuron in CURRENT_LAYER to all the neurons in the PREV_LAYER.
- r-th element of c-th column in the weights matrix represents the connection of c-th neuron in CURRENT_LAYER to r-th neuron in the PREV_LAYER.
- r-th row in the weights matrix represents the connection of all the neurons in the PREV_LAYER to r-th neuron in CURRENT_LAYER.
- c-th element of r-th row in the weights matrix represents connection of c-th neuron in PREV_LAYER to r-th neuron in CURRENT_LAYER.
- Points 1 and 2 will be used when we use weights matrix in normal sense, but points 3 and 4 will be used when we use weights matrix in transposed sense (a(i, j)=a(j, I))
Now, remember we have an extra bias neuron in the previous layer. If we do a simple matrix product of neuronsLayer vector of PREV_LAYER and weights matrix of CURRENT_LAYER, we will get the new neuronsLayer vector of CURRENT_LAYER. What we have to do now is modify our weights matrix in a manner so that the bias neuron of CURRENT_LAYER remains unaffected by matrix multiplication! For that we set all the element of last column of weights matrix to 0 (line 26) except that last element (line 27).
Code: Feed Forward Algorithm
Explanation of feed forward algorithm:
C-th element (neuron) of the CURRENT_LAYER takes it’s input by taking a dot product between neuronLayers vector of PREV_LAYER and the C-th column. This way, it takes the inputs multiplied by weight and this also automatically adds up the bias term. The last column of weights matrix is initialized by setting all elements to 0 except the last element (set to 1), what this means is that the bias neuron of CURRENT_LAYER takes input from bias neuron of PREV_LAYER only.
Code: Updating the weights
Code: Activation Function
Code: Training neural network
Code: Loading data
The user can read csv files using this code and paste this in the neural network class but be careful, the declarations and definitions must be kept in separate files (NeuralNetwork.cpp and NeuralNetwork.h). Save all the files and be with me for a few minutes!
Code: Generate Some Noise i.e. training data
Code: Implementation of the neural network.
To compile the program, open your linux terminal and type :
g++ main.cpp NeuralNetwork.cpp -o main && ./main
Run this command. Try experimenting with the number of data points in that genData function.