Machine Learning in C++
Most of us have C++ as our First Language but when it comes to something like Data Analysis and Machine Learning, Python becomes our go-to Language because of its simplicity and plenty of libraries of pre-written Modules.
But can C++ be used for Machine Learning too? and If yes, then how?
Pre-requisites:
- C++ Boost Library:- It is a powerful C++ library used for various purposes like big Maths Operations, etc.
You can refer here for installation of this Library - ML pack C++ Library:- This is a small and Scalable C++ Machine Learning Library.
You can refer here for the installation of this Library.
Note: set USE_OPENMP=OFF when installing mlpack, don’t sweat, given link has guide on how to do that - Sample CSV Data File:- As MLpack library does not have any inbuilt Sample Dataset so we have to use our own Sample Dataset.
Our Model
The Code we are writing takes a simple dataset of vectors and finds the nearest neighbour for each data point.
The Training Part has been highlighted
Input : Our Input is a file named data.csv containing a dataset of vectors The File Contains the Following Data: 3, 3, 3, 3, 0 3, 4, 4, 3, 0 3, 4, 4, 3, 0 3, 3, 4, 3, 0 3, 6, 4, 3, 0 2, 4, 4, 3, 0 2, 4, 4, 1, 0 3, 3, 3, 2, 0 3, 4, 4, 2, 0 3, 4, 4, 2, 0 3, 3, 4, 2, 0 3, 6, 4, 2, 0 2, 4, 4, 2, 0
Code:
#include <mlpack/core.hpp> #include <mlpack/methods/neighbor_search/neighbor_search.hpp> using namespace std; using namespace mlpack; // NeighborSearch and NearestNeighborSort using namespace mlpack::neighbor; // ManhattanDistance using namespace mlpack::metric; void mlModel() { // Armadillo is a C++ linear algebra library; // mlpack uses its matrix data type. arma::mat data; /* data::Load is used to import data to the mlpack, It takes 3 parameters, 1. Filename = Name of the File to be used 2. Matrix = Matrix to hold the Data in the File 3. fatal = true if you want it to throw an exception if there is an issue */ data::Load( "data.csv" , data, true ); /* Create a NeighborSearch model. The parameters of the model are specified with templates: 1. Sorting method: "NearestNeighborSort" - This class sorts by increasing distance. 2. Distance metric: "ManhattanDistance" - The L1 distance, the sum of absolute distances. 3. Pass the reference dataset (the vectors to be searched through) to the constructor. */ NeighborSearch<NearestNeighborSort, ManhattanDistance> nn(data); // in the above line we trained our model or // fitted the data to the model // now we will predict arma::Mat< size_t > neighbors; // Matrices to hold arma::mat distances; // the results /* Find the nearest neighbors. Arguments are:- 1. k = 1, Specify the number of neighbors to find 2. Matrices to hold the result, in this case, neighbors and distances */ nn.Search(1, neighbors, distances); // in the above line we find the nearest neighbor // Print out each neighbor and its distance. for ( size_t i = 0; i < neighbors.n_elem; ++i) { std::cout << "Nearest neighbor of point " << i << " is point " << neighbors[i] << " and the distance is " << distances[i] << ".\n" ; } } int main() { mlModel(); return 0; } |
Run the above code in Terminal/CMD using
g++ knn_example.cpp -o knn_example -std=c++11 -larmadillo -lmlpack -lboost_serialization
followed by
./knn_example
Output: Nearest neighbor of point 0 is point 7 and the distance is 1. Nearest neighbor of point 1 is point 2 and the distance is 0. Nearest neighbor of point 2 is point 1 and the distance is 0. Nearest neighbor of point 3 is point 10 and the distance is 1. Nearest neighbor of point 4 is point 11 and the distance is 1. Nearest neighbor of point 5 is point 12 and the distance is 1. Nearest neighbor of point 6 is point 12 and the distance is 1. Nearest neighbor of point 7 is point 10 and the distance is 1. Nearest neighbor of point 8 is point 9 and the distance is 0. Nearest neighbor of point 9 is point 8 and the distance is 0. Nearest neighbor of point 10 is point 9 and the distance is 1. Nearest neighbor of point 11 is point 4 and the distance is 1. Nearest neighbor of point 12 is point 9 and the distance is 1.