ML – Neural Network Implementation in C++ From Scratch

What are we going to do here?
This article explains how to create a super-fast Artificial Neural Network that can crunch millions of data points withing seconds! even milliseconds. Artificial Intelligence and Machine Learning are nowadays one of the most trending topics among computer geeks. Data scientists are being hired by tech giants for their excellence in these fields.

Why use C++
Now, if you have already implemented a neural network model in some other programming language then you might have noticed ( If you have a low-end PC ) that your models work pretty slow on even small datasets. When you began learning about Neural Networks you might have googled Which language is best for machine learning? and the obvious answer you get is Python or R is best for machine learning, other languages are hard so you must not waste your time on them!. Now, if the user starts programming, they face the problem of time and resource consumption. So, this article shows how to a super fast neural network.


Prerequisites:

  • Basic knowledge about what are classes and how they work.
  • Use a linear algebra library called Eigen
  • Some basic read/write operations in C++
  • Some basic knowledge about linear algebra as we are using a library for that

Eigen 101:
Eigen by its core is a library for super fast linear algebra operations and it’s the fastest and easiest one out there. Some resources to learn the basics of Eigen.

While learning Eigen you will encounter one of the most powerful feature of C++ – Template Metaprogramming. It is recommended not to get deviated from the track right now ( if you are new to C++ ) and assume those as basic parameters to a function! However if you are really obsessed with learning new and powerful things then here’s a good article and a video for it.
Writing the Neural Network class
Before going further I assume that you know what a Neural Network is and how does it learn. If not, then I do recommend you the following pages to take a look at!



Code : The Neural Network Class

filter_none

edit
close

play_arrow

link
brightness_4
code

// NeuralNetwork.hpp
#include <eigen3/Eigen/Eigen>
#include <iostream>
#include <vector>
  
// use typedefs for future ease for changing data types like : float to double
typedef float Scalar;
typedef Eigen::MatrixXf Matrix;
typedef Eigen::RowVectorXf RowVector;
typedef Eigen::VectorXf ColVector;
  
// neural network implementation class!
class NeuralNetwork {
public:
    // constructor
    NeuralNetwork(std::vector<uint> topology, Scalar learningRate = Scalar(0.005));
  
    // function for forward propagation of data
    void propagateForward(RowVector& input);
  
    // function for backward propagation of errors made by neurons
    void propagateBackward(RowVector& output);
  
    // function to calculate errors made by neurons in each layer
    void calcErrors(RowVector& output);
  
    // function to update the weights of connections
    void updateWeights();
  
    // function to train the neural network give an array of data points
    void train(std::vector<RowVector*> data);
  
    // storage objects for working of neural network
    /*
          use pointers when using std::vector<Class> as std::vector<Class> calls destructor of 
          Class as soon as it is pushed back! when we use pointers it can't do that, besides
          it also makes our neural network class less heavy!! It would be nice if you can use
          smart pointers instead of usual ones like this
        */
    std::vector<RowVector*> neuronLayers; // stores the different layers of out network
    std::vector<RowVector*> cacheLayers; // stores the unactivated (activation fn not yet applied) values of layers
    std::vector<RowVector*> deltas; // stores the error contribution of each neurons
    std::vector<Matrix*> weights; // the connection weights itself
    Scalar learningRate;
};

chevron_right


Next, we move ahead by implementing each function one by one… But first, create two files (NeuralNetwork.cpp and NeuralNetwork.hpp) and write the above NeuralNetwork class code yourself in the “NeuralNetwork.hpp”. The following line of code must be copied in the “NeuralNetwork.cpp” file.

Code: Constructor for the Neural Network Class

filter_none

edit
close

play_arrow

link
brightness_4
code

// constructor of neural network class
NeuralNetwork::NeuralNetwork(std::vector<uint> topology, Scalar learningRate)
{
    this->topology = topology;
    this->learningRate = learningRate;
    for (uint i = 0; i < topology.size(); i++) {
        // initialze neuron layers
        if (i == topology.size() - 1)
            neuronLayers.push_back(new RowVector(topology[i]));
        else
            neuronLayers.push_back(new RowVector(topology[i] + 1));
  
        // initialize cache and delta vectors
        cacheLayers.push_back(new RowVector(neuronLayers.size()));
        deltas.push_back(new RowVector(neuronLayers.size()));
  
        // vector.back() gives the handle to recently added element
        // coeffRef gives the reference of value at that place 
        // (using this as we are using pointers here)
        if (i != topology.size() - 1) {
            neuronLayers.back()->coeffRef(topology[i]) = 1.0;
            cacheLayers.back()->coeffRef(topology[i]) = 1.0;
        }
  
        // initialze weights matrix
        if (i > 0) {
            if (i != topology.size() - 1) {
                weights.push_back(new Matrix(topology[i - 1] + 1, topology[i] + 1));
                weights.back()->setRandom();
                weights.back()->col(topology[i]).setZero();
                weights.back()->coeffRef(topology[i - 1], topology[i]) = 1.0;
            }
            else {
                weights.push_back(new Matrix(topology[i - 1] + 1, topology[i]));
                weights.back()->setRandom();
            }
        }
    }
};

chevron_right


Explanation of constructor function – Initializing the neurons, cache and deltas
The topology vector describes how many neurons we have in each layer, and the size of this vector is equal to a number of layers in the neural network. Each layer in the neural network is an array of neurons, we store each of these layers as a vector such that each element in this vector stores the activation value of neuron in that layer (note that an array of these layers is the neural network itself. Now in line 8, we add an extra bias neuron to each layer except in the output layer (line 7). The cache and delta vector is of the same dimensions as that of the neuronLayer vector. We are using vectors here as layers and not a 2D matrix as we are doing SGD and not batch or mini-batch gradient descent. Now, a cache is just another name of the sum of weighted inputs from the previous layer.
A notation that we will use for dimensions of a matrix is: [m n] denotes a matrix having m rows and n columns.

Initializing Weights matrix
Initializing weights matrix is a bit tricky! (mathematically). Pay very serious attention to whatever you read for the next few lines as this will explain how we want to use the weights matrix throught this article. I assume that you know how layers are interconnected in a neural network.

  • CURRENT_LAYER represents the layer which is taking input and PREV_LAYER and FWD_LAYER represents a layer back and a layer front of the CURRENT_LAYER.
  • c-th column in the weights matrix represents the connection of c-th neuron in CURRENT_LAYER to all the neurons in the PREV_LAYER.
  • r-th element of c-th column in the weights matrix represents the connection of c-th neuron in CURRENT_LAYER to r-th neuron in the PREV_LAYER.
  • r-th row in the weights matrix represents the connection of all the neurons in the PREV_LAYER to r-th neuron in CURRENT_LAYER.
  • c-th element of r-th row in the weights matrix represents connection of c-th neuron in PREV_LAYER to r-th neuron in CURRENT_LAYER.
  • Points 1 and 2 will be used when we use weights matrix in normal sense, but points 3 and 4 will be used when we use weights matrix in transposed sense (a(i, j)=a(j, I))

Now, remember we have an extra bias neuron in the previous layer. If we do a simple matrix product of neuronsLayer vector of PREV_LAYER and weights matrix of CURRENT_LAYER, we will get the new neuronsLayer vector of CURRENT_LAYER. What we have to do now is modify our weights matrix in a manner so that the bias neuron of CURRENT_LAYER remains unaffected by matrix multiplication! For that we set all the element of last column of weights matrix to 0 (line 26) except that last element (line 27).

Code: Feed Forward Algorithm

filter_none

edit
close

play_arrow

link
brightness_4
code

void NeuralNetwork::propagateForward(RowVector& input)
{
    // set the input to input layer
    // block returns a part of the given vector or matrix
    // block takes 4 arguments : startRow, startCol, blockRows, blockCols
    neuronLayers.front()->block(0, 0, 1, neuronLayers.front()->size() - 1) = input;
  
    // propagate the data forawrd
    for (uint i = 1; i < topology.size(); i++) {
        // already explained above
        (*neuronLayers[i]) = (*neuronLayers[i - 1]) * (*weights[i - 1]);
    }
  
    // apply the activation function to your network
    // unaryExpr applies the given function to all elements of CURRENT_LAYER
    for (uint i = 1; i < topology.size() - 1; i++) {
        neuronLayers[i]->block(0, 0, 1, topology[i]).unaryExpr(std::ptr_fun(activationFunction));
    }
}

chevron_right


Explanation of feed forward algorithm:
C-th element (neuron) of the CURRENT_LAYER takes it’s input by taking a dot product between neuronLayers vector of PREV_LAYER and the C-th column. This way, it takes the inputs multiplied by weight and this also automatically adds up the bias term. The last column of weights matrix is initialized by setting all elements to 0 except the last element (set to 1), what this means is that the bias neuron of CURRENT_LAYER takes input from bias neuron of PREV_LAYER only.



Calculating Errors:

filter_none

edit
close

play_arrow

link
brightness_4
code

void NeuralNetwork::calcErrors(RowVector& output)
{
    // calculate the errors made by neurons of last layer
    (*deltas.back()) = output - (*neuronLayers.back());
  
    // error calculation of hidden layers is different
    // we will begin by the last hidden layer
    // and we will continue till the first hidden layer
    for (uint i = topology.size() - 2; i > 0; i--) {
        (*deltas[i]) = (*deltas[i + 1]) * (weights[i]->transpose());
    }
}

chevron_right


Code: Updating the weights

filter_none

edit
close

play_arrow

link
brightness_4
code

void NeuralNetwork::updateWeights()
{
    // topology.size()-1 = weights.size()
    for (uint i = 0; i < topology.size() - 1; i++) {
        // in this loop we are iterating over the different layers (from first hidden to output layer)
        // if this layer is the output layer, there is no bias neuron there, number of neurons specified = number of cols
        // if this layer not the output layer, there is a bias neuron and number of neurons specified = number of cols -1
        if (i != topology.size() - 2) {
            for (uint c = 0; c < weights[i]->cols() - 1; c++) {
                for (uint r = 0; r < weights[i]->rows(); r++) {
                    weights[i]->coeffRef(r, c) += learningRate * deltas[i + 1]->coeffRef(c) * activationFunctionDerivative(cacheLayers[i + 1]->coeffRef(c)) * neuronLayers[i]->coeffRef(r);
                }
            }
        }
        else {
            for (uint c = 0; c < weights[i]->cols(); c++) {
                for (uint r = 0; r < weights[i]->rows(); r++) {
                    weights[i]->coeffRef(r, c) += learningRate * deltas[i + 1]->coeffRef(c) * activationFunctionDerivative(cacheLayers[i + 1]->coeffRef(c)) * neuronLayers[i]->coeffRef(r);
                }
            }
        }
    }
}

chevron_right


Backpropagation Algorithm:

filter_none

edit
close

play_arrow

link
brightness_4
code

void NeuralNetwork::propagateBackward(RowVector& output)
{
    calcErrors(output);
    updateWeights();
}

chevron_right


Code: Activation Function

filter_none

edit
close

play_arrow

link
brightness_4
code

Scalar activationFunction(Scalar x)
{
    return tanhf(x);
}
  
Scalar activationFunctionDerivative(Scalar x)
{
    return 1 - tanhf(x) * tanhf(x);
}
// you can use your own code here!

chevron_right


Code: Training neural network

filter_none

edit
close

play_arrow

link
brightness_4
code

void NeuralNetwork::train(std::vector<RowVector*> input_data, std::vector<RowVector*> output_data)
{
    for (uint i = 0; i < input_data.size(); i++) {
        std::cout << "Input to neural network is : " << *input_data[i] << std::endl;
        propagateForward(*input_data[i]);
        std::cout << "Expected output is : " << *output_data[i] << std::endl;
        std::cout << "Output produced is : " << *neuronLayers.back() << std::endl;
        propagateBackward(*output_data[i]);
        std::cout << "MSE : " << std::sqrt((*deltas.back()).dot((*deltas.back())) / deltas.back()->size()) << std::endl;
    }
}

chevron_right


Code: Loading data

filter_none

edit
close

play_arrow

link
brightness_4
code

void ReadCSV(std::string filename, std::vector<RowVector*>& data)
{
    data.clear();
    std::ifstream file(filename);
    std::string line, word;
    // determine number of columns in file
    getline(file, line, '\n');
    std::stringstream ss(line);
    std::vector<Scalar> parsed_vec;
    while (getline(ss, word, ', ')) {
        parsed_vec.push_back(Scalar(std::stof(&word[0])));
    }
    uint cols = parsed_vec.size();
    data.push_back(new RowVector(cols));
    for (uint i = 0; i < cols; i++) {
        data.back()->coeffRef(1, i) = parsed_vec[i];
    }
  
    // read the file
    if (file.is_open()) {
        while (getline(file, line, '\n')) {
            std::stringstream ss(line);
            data.push_back(new RowVector(1, cols));
            uint i = 0;
            while (getline(ss, word, ', ')) {
                data.back()->coeffRef(i) = Scalar(std::stof(&word[0]));
                i++;
            }
        }
    }
}

chevron_right


The user can read csv files using this code and paste this in the neural network class but be careful, the declarations and definitions must be kept in separate files (NeuralNetwork.cpp and NeuralNetwork.h). Save all the files and be with me for a few minutes!

Code: Generate Some Noise i.e. training data

filter_none

edit
close

play_arrow

link
brightness_4
code

void genData(std::string filename)
{
    std::ofstream file1(filename + "-in");
    std::ofstream file2(filename + "-out");
    for (uint r = 0; r < 1000; r++) {
        Scalar x = rand() / Scalar(RAND_MAX);
        Scalar y = rand() / Scalar(RAND_MAX);
        file1 << x << ", " << y << std::endl;
        file2 << 2 * x + 10 + y << std::endl;
    }
    file1.close();
    file2.close();
}

chevron_right


Code: Implementation of the neural network.

filter_none

edit
close

play_arrow

link
brightness_4
code

// main.cpp
  
// don't forget to include out neural network
#include "NeuralNetwork.hpp"
  
//... data generator code here
  
typedef std::vector<RowVector*> data;
int main()
{
    NeuralNetwork n({ 2, 3, 1 });
    data in_dat, out_dat;
    genData("test");
    ReadCSV("test-in", in_dat);
    ReadCSV("test-out", out_dat);
    n.train(in_dat, out_dat);
    return 0;
}

chevron_right


To compile the program, open your linux terminal and type :
g++ main.cpp NeuralNetwork.cpp -o main && ./main

Run this command. Try experimenting with the number of data points in that genData function.




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.