# ML – Neural Network Implementation in C++ From Scratch

**What are we going to do here? **

This article explains how to create a super-fast Artificial Neural Network that can crunch millions of data points withing seconds! even milliseconds. Artificial Intelligence and Machine Learning are nowadays one of the most trending topics among computer geeks. Data scientists are being hired by tech giants for their excellence in these fields. **Why use C++ **

Now, if you have already implemented a neural network model in some other programming language then you might have noticed ( If you have a low-end PC ) that your models work pretty slow on even small datasets. When you began learning about Neural Networks you might have googled *Which language is best for machine learning?* and the obvious answer you get is *Python or R is best for machine learning, other languages are hard so you must not waste your time on them!*. Now, if the user starts programming, they face the problem of time and resource consumption. So, this article shows how to a super fast neural network.**Prerequisites: **

- Basic knowledge about what are classes and how they work.
- Use a linear algebra library called Eigen
- Some basic read/write operations in C++
- Some basic knowledge about linear algebra as we are using a library for that

**Eigen 101: **

Eigen by its core is a library for super fast linear algebra operations and it’s the fastest and easiest one out there. Some resources to learn the basics of Eigen.

While learning Eigen you will encounter one of the most powerful feature of C++ – Template Metaprogramming. It is recommended not to get deviated from the track right now ( if you are new to C++ ) and assume those as basic parameters to a function! However if you are really obsessed with learning new and powerful things then here’s a good article and a video for it. **Writing the Neural Network class **

Before going further I assume that you know what a Neural Network is and how does it learn. If not, then I do recommend you the following pages to take a look at!

**Code : The Neural Network Class**

## CPP

`// NeuralNetwork.hpp` `#include <eigen3/Eigen/Eigen>` `#include <iostream>` `#include <vector>` `// use typedefs for future ease for changing data types like : float to double` `typedef` `float` `Scalar;` `typedef` `Eigen::MatrixXf Matrix;` `typedef` `Eigen::RowVectorXf RowVector;` `typedef` `Eigen::VectorXf ColVector;` `// neural network implementation class!` `class` `NeuralNetwork {` `public` `:` ` ` `// constructor` ` ` `NeuralNetwork(std::vector<uint> topology, Scalar learningRate = Scalar(0.005));` ` ` `// function for forward propagation of data` ` ` `void` `propagateForward(RowVector& input);` ` ` `// function for backward propagation of errors made by neurons` ` ` `void` `propagateBackward(RowVector& output);` ` ` `// function to calculate errors made by neurons in each layer` ` ` `void` `calcErrors(RowVector& output);` ` ` `// function to update the weights of connections` ` ` `void` `updateWeights();` ` ` `// function to train the neural network give an array of data points` ` ` `void` `train(std::vector<RowVector*> data);` ` ` `// storage objects for working of neural network` ` ` `/*` ` ` `use pointers when using std::vector<Class> as std::vector<Class> calls destructor of` ` ` `Class as soon as it is pushed back! when we use pointers it can't do that, besides` ` ` `it also makes our neural network class less heavy!! It would be nice if you can use` ` ` `smart pointers instead of usual ones like this` ` ` `*/` ` ` `std::vector<RowVector*> neuronLayers; ` `// stores the different layers of out network` ` ` `std::vector<RowVector*> cacheLayers; ` `// stores the unactivated (activation fn not yet applied) values of layers` ` ` `std::vector<RowVector*> deltas; ` `// stores the error contribution of each neurons` ` ` `std::vector<Matrix*> weights; ` `// the connection weights itself` ` ` `Scalar learningRate;` `};` |

Next, we move ahead by implementing each function one by one… But first, create two files (NeuralNetwork.cpp and NeuralNetwork.hpp) and write the above NeuralNetwork class code yourself in the “NeuralNetwork.hpp”. The following line of code must be copied in the “NeuralNetwork.cpp” file.**Code: Constructor for the Neural Network Class**

## CPP

`// constructor of neural network class` `NeuralNetwork::NeuralNetwork(std::vector<uint> topology, Scalar learningRate)` `{` ` ` `this` `->topology = topology;` ` ` `this` `->learningRate = learningRate;` ` ` `for` `(uint i = 0; i < topology.size(); i++) {` ` ` `// initialize neuron layers` ` ` `if` `(i == topology.size() - 1)` ` ` `neuronLayers.push_back(` `new` `RowVector(topology[i]));` ` ` `else` ` ` `neuronLayers.push_back(` `new` `RowVector(topology[i] + 1));` ` ` `// initialize cache and delta vectors` ` ` `cacheLayers.push_back(` `new` `RowVector(neuronLayers.size()));` ` ` `deltas.push_back(` `new` `RowVector(neuronLayers.size()));` ` ` `// vector.back() gives the handle to recently added element` ` ` `// coeffRef gives the reference of value at that place` ` ` `// (using this as we are using pointers here)` ` ` `if` `(i != topology.size() - 1) {` ` ` `neuronLayers.back()->coeffRef(topology[i]) = 1.0;` ` ` `cacheLayers.back()->coeffRef(topology[i]) = 1.0;` ` ` `}` ` ` `// initialize weights matrix` ` ` `if` `(i > 0) {` ` ` `if` `(i != topology.size() - 1) {` ` ` `weights.push_back(` `new` `Matrix(topology[i - 1] + 1, topology[i] + 1));` ` ` `weights.back()->setRandom();` ` ` `weights.back()->col(topology[i]).setZero();` ` ` `weights.back()->coeffRef(topology[i - 1], topology[i]) = 1.0;` ` ` `}` ` ` `else` `{` ` ` `weights.push_back(` `new` `Matrix(topology[i - 1] + 1, topology[i]));` ` ` `weights.back()->setRandom();` ` ` `}` ` ` `}` ` ` `}` `};` |

**Explanation of constructor function – Initializing the neurons, cache and deltas**

The topology vector describes how many neurons we have in each layer, and the size of this vector is equal to a number of layers in the neural network. Each layer in the neural network is an array of neurons, we store each of these layers as a vector such that each element in this vector stores the activation value of neuron in that layer (note that an array of these layers is the neural network itself. Now in line 8, we add an extra bias neuron to each layer except in the output layer (line 7). The cache and delta vector is of the same dimensions as that of the neuronLayer vector. We are using vectors here as layers and not a 2D matrix as we are doing SGD and not batch or mini-batch gradient descent. Now, a cache is just another name of the sum of weighted inputs from the previous layer.

A notation that we will use for dimensions of a matrix is: **[m n]** denotes a matrix having m rows and n columns. **Initializing Weights matrix**

Initializing weights matrix is a bit tricky! (mathematically). Pay very serious attention to whatever you read for the next few lines as this will explain how we want to use the weights matrix throught this article. I assume that you know how layers are interconnected in a neural network.

- CURRENT_LAYER represents the layer which is taking input and PREV_LAYER and FWD_LAYER represents a layer back and a layer front of the CURRENT_LAYER.
- c-th column in the weights matrix represents the connection of c-th neuron in CURRENT_LAYER to all the neurons in the PREV_LAYER.
- r-th element of c-th column in the weights matrix represents the connection of c-th neuron in CURRENT_LAYER to r-th neuron in the PREV_LAYER.
- r-th row in the weights matrix represents the connection of all the neurons in the PREV_LAYER to r-th neuron in CURRENT_LAYER.
- c-th element of r-th row in the weights matrix represents connection of c-th neuron in PREV_LAYER to r-th neuron in CURRENT_LAYER.
- Points 1 and 2 will be used when we use weights matrix in normal sense, but points 3 and 4 will be used when we use weights matrix in transposed sense (a(i, j)=a(j, I))

Now, remember we have an extra bias neuron in the previous layer. If we do a simple matrix product of neuronsLayer vector of PREV_LAYER and weights matrix of CURRENT_LAYER, we will get the new neuronsLayer vector of CURRENT_LAYER. What we have to do now is modify our weights matrix in a manner so that the bias neuron of CURRENT_LAYER remains unaffected by matrix multiplication! For that we set all the element of last column of weights matrix to 0 (line 26) except that last element (line 27).**Code: Feed Forward Algorithm**

## CPP

`void` `NeuralNetwork::propagateForward(RowVector& input)` `{` ` ` `// set the input to input layer` ` ` `// block returns a part of the given vector or matrix` ` ` `// block takes 4 arguments : startRow, startCol, blockRows, blockCols` ` ` `neuronLayers.front()->block(0, 0, 1, neuronLayers.front()->size() - 1) = input;` ` ` `// propagate the data forward` ` ` `for` `(uint i = 1; i < topology.size(); i++) {` ` ` `// already explained above` ` ` `(*neuronLayers[i]) = (*neuronLayers[i - 1]) * (*weights[i - 1]);` ` ` `}` ` ` `// apply the activation function to your network` ` ` `// unaryExpr applies the given function to all elements of CURRENT_LAYER` ` ` `for` `(uint i = 1; i < topology.size() - 1; i++) {` ` ` `neuronLayers[i]->block(0, 0, 1, topology[i]).unaryExpr(std::ptr_fun(activationFunction));` ` ` `}` `}` |

**Explanation of feed forward algorithm: **

C-th element (neuron) of the CURRENT_LAYER takes it’s input by taking a dot product between neuronLayers vector of PREV_LAYER and the C-th column. This way, it takes the inputs multiplied by weight and this also automatically adds up the bias term. The last column of weights matrix is initialized by setting all elements to 0 except the last element (set to 1), what this means is that the bias neuron of CURRENT_LAYER takes input from bias neuron of PREV_LAYER only.**Calculating Errors:**

## CPP

`void` `NeuralNetwork::calcErrors(RowVector& output)` `{` ` ` `// calculate the errors made by neurons of last layer` ` ` `(*deltas.back()) = output - (*neuronLayers.back());` ` ` `// error calculation of hidden layers is different` ` ` `// we will begin by the last hidden layer` ` ` `// and we will continue till the first hidden layer` ` ` `for` `(uint i = topology.size() - 2; i > 0; i--) {` ` ` `(*deltas[i]) = (*deltas[i + 1]) * (weights[i]->transpose());` ` ` `}` `}` |

**Code: Updating the weights **

## CPP

`void` `NeuralNetwork::updateWeights()` `{` ` ` `// topology.size()-1 = weights.size()` ` ` `for` `(uint i = 0; i < topology.size() - 1; i++) {` ` ` `// in this loop we are iterating over the different layers (from first hidden to output layer)` ` ` `// if this layer is the output layer, there is no bias neuron there, number of neurons specified = number of cols` ` ` `// if this layer not the output layer, there is a bias neuron and number of neurons specified = number of cols -1` ` ` `if` `(i != topology.size() - 2) {` ` ` `for` `(uint c = 0; c < weights[i]->cols() - 1; c++) {` ` ` `for` `(uint r = 0; r < weights[i]->rows(); r++) {` ` ` `weights[i]->coeffRef(r, c) += learningRate * deltas[i + 1]->coeffRef(c) * activationFunctionDerivative(cacheLayers[i + 1]->coeffRef(c)) * neuronLayers[i]->coeffRef(r);` ` ` `}` ` ` `}` ` ` `}` ` ` `else` `{` ` ` `for` `(uint c = 0; c < weights[i]->cols(); c++) {` ` ` `for` `(uint r = 0; r < weights[i]->rows(); r++) {` ` ` `weights[i]->coeffRef(r, c) += learningRate * deltas[i + 1]->coeffRef(c) * activationFunctionDerivative(cacheLayers[i + 1]->coeffRef(c)) * neuronLayers[i]->coeffRef(r);` ` ` `}` ` ` `}` ` ` `}` ` ` `}` `}` |

**Backpropagation Algorithm: **

## CPP

`void` `NeuralNetwork::propagateBackward(RowVector& output)` `{` ` ` `calcErrors(output);` ` ` `updateWeights();` `}` |

**Code: Activation Function**

## CPP

`Scalar activationFunction(Scalar x)` `{` ` ` `return` `tanhf(x);` `}` `Scalar activationFunctionDerivative(Scalar x)` `{` ` ` `return` `1 - tanhf(x) * tanhf(x);` `}` `// you can use your own code here!` |

**Code: Training neural network **

## CPP

`void` `NeuralNetwork::train(std::vector<RowVector*> input_data, std::vector<RowVector*> output_data)` `{` ` ` `for` `(uint i = 0; i < input_data.size(); i++) {` ` ` `std::cout << ` `"Input to neural network is : "` `<< *input_data[i] << std::endl;` ` ` `propagateForward(*input_data[i]);` ` ` `std::cout << ` `"Expected output is : "` `<< *output_data[i] << std::endl;` ` ` `std::cout << ` `"Output produced is : "` `<< *neuronLayers.back() << std::endl;` ` ` `propagateBackward(*output_data[i]);` ` ` `std::cout << ` `"MSE : "` `<< std::` `sqrt` `((*deltas.back()).dot((*deltas.back())) / deltas.back()->size()) << std::endl;` ` ` `}` `}` |

**Code: Loading data**

## CPP

`void` `ReadCSV(std::string filename, std::vector<RowVector*>& data)` `{` ` ` `data.clear();` ` ` `std::ifstream file(filename);` ` ` `std::string line, word;` ` ` `// determine number of columns in file` ` ` `getline(file, line, ` `'\n'` `);` ` ` `std::stringstream ss(line);` ` ` `std::vector<Scalar> parsed_vec;` ` ` `while` `(getline(ss, word, ` `', '` `)) {` ` ` `parsed_vec.push_back(Scalar(std::stof(&word[0])));` ` ` `}` ` ` `uint cols = parsed_vec.size();` ` ` `data.push_back(` `new` `RowVector(cols));` ` ` `for` `(uint i = 0; i < cols; i++) {` ` ` `data.back()->coeffRef(1, i) = parsed_vec[i];` ` ` `}` ` ` `// read the file` ` ` `if` `(file.is_open()) {` ` ` `while` `(getline(file, line, ` `'\n'` `)) {` ` ` `std::stringstream ss(line);` ` ` `data.push_back(` `new` `RowVector(1, cols));` ` ` `uint i = 0;` ` ` `while` `(getline(ss, word, ` `', '` `)) {` ` ` `data.back()->coeffRef(i) = Scalar(std::stof(&word[0]));` ` ` `i++;` ` ` `}` ` ` `}` ` ` `}` `}` |

The user can read csv files using this code and paste this in the neural network class but be careful, the declarations and definitions must be kept in separate files (NeuralNetwork.cpp and NeuralNetwork.h). Save all the files and be with me for a few minutes!**Code: Generate Some Noise i.e. training data**

## CPP

`void` `genData(std::string filename)` `{` ` ` `std::ofstream file1(filename + ` `"-in"` `);` ` ` `std::ofstream file2(filename + ` `"-out"` `);` ` ` `for` `(uint r = 0; r < 1000; r++) {` ` ` `Scalar x = ` `rand` `() / Scalar(RAND_MAX);` ` ` `Scalar y = ` `rand` `() / Scalar(RAND_MAX);` ` ` `file1 << x << ` `", "` `<< y << std::endl;` ` ` `file2 << 2 * x + 10 + y << std::endl;` ` ` `}` ` ` `file1.close();` ` ` `file2.close();` `}` |

**Code: Implementation of the neural network.**

## CPP

`// main.cpp` `// don't forget to include out neural network` `#include "NeuralNetwork.hpp"` `//... data generator code here` `typedef` `std::vector<RowVector*> data;` `int` `main()` `{` ` ` `NeuralNetwork n({ 2, 3, 1 });` ` ` `data in_dat, out_dat;` ` ` `genData(` `"test"` `);` ` ` `ReadCSV(` `"test-in"` `, in_dat);` ` ` `ReadCSV(` `"test-out"` `, out_dat);` ` ` `n.train(in_dat, out_dat);` ` ` `return` `0;` `}` |

To compile the program, open your linux terminal and type : **g++ main.cpp NeuralNetwork.cpp -o main && ./main**

Run this command. Try experimenting with the number of data points in that **genData** function.