MPI – Distributed Computing made easy

 

The Underlying Problem

To make things easier, let’s directly jump to some statistics:



  • Facebook, currently, has 1.5 billion active monthly users.
  • Google performs at least 1 trillion searches per year.
  • About 48 hours of video is uploaded in Youtube every minute.

With such a high demand, I do believe that a single system would be unable to handle the processing. Thus, comes the need for Distributed Systems.

What is Distributed Computing?

A distributed system consists of a collection of autonomous computers, connected through a network and distribution middleware, which enables computers to coordinate their activities and to share the resources of the system, so that users perceive the system as a single, integrated computing facility.

Let us say about Google Web Server, from users perspective while they submit the searched query, they assume google web server as a single system. However, behind the curtain, google has built a lot of servers which is distributed (geographically and computationally) to give us the result within few seconds.

Advantages of Distributed Computing?

  • Highly efficient
  • Scalability
  • Less tolerant to failures
  • High Availability

Let us look at an example where we save the computational time by using distributed computing.

For eg. If we have an array, a, having n elements, a=[1, 2, 3, 4, 5, 6]

We want to sum all the elements of the array and output it. Now, let us assume that there are 1020 elements in the array and the time to compute the sum is x.

If we now divide the array in 3 parts, a1, a2 and a3 where

a1 = { Set of elements where modulo(element from a) == 0 }


a2 = { Set of elements where modulo(element from a) == 1 }

a3 = { Set of elements where modulo(element from a) == 2 }

We will send these 3 arrays to 3 different processes for computing the sum of these individual processes. On an average, let’s assume that each array has n/3 elements. Thus, time taken by each process will also reduces to x/3. Since these process will be running in parallel, the three “x/3” will be computed simultaneously and sum of each array is returned to the main process. At the end, we can compute the final sum of a by summing up the individual sum of the arrays: a1, a2 and a3.

Thus, we are able to reduce the time from x to x/3, if we are running the processed simultaneously.

What is MPI?

Message Passing Interface (MPI) is a standardized and portable message-passing system developed for distributed and parallel computing. MPI provides parallel hardware vendors with a clearly defined base set of routines that can be efficiently implemented. As a result, hardware vendors can build upon this collection of standard low-level routines to create higher-level routines for the distributed-memory communication environment supplied with their parallel machines.

MPI gives user the flexibility of calling set of routines from C, C++, Fortran, C#, Java or Python. The advantages of MPI over older message passing libraries are portability (because MPI has been implemented for almost every distributed memory architecture) and speed (because each implementation is in principle optimized for the hardware on which it runs)

The advantages of MPI over other message passing framework is portability and speed. It has been implemented for almost every distributed memory architecture and each implementation is in principle optimized for the hardware on which it runs.

Even though there are options available for multiple languages, Python is the most preferred one due to simplicity, ease of writing the code. So, now, we will now look at how to install MPI on ubuntu 14.10.

Install MPI on Ubuntu


1) Step No. 1: Copy the following line of code in your terminal to install NumPy, a package for all scientific computing in python.

sudo apt-get install python-numpy

2) After successful completion of the above step, execute the following commands to update the system and install the pip package.

                           sudo apt-get update
                           sudo apt-get -y install python-pip

3) Now, we will download the doc for the latest version of the MPI.

sudo apt-get install libcr-dev mpich2 mpich2-doc

4) Enter the command to download MPI using pip for python

sudo pip install mpi4py

MPI is successfully installed now.

Sometimes, a problem might pop up while clearing up the packages after MPI has been installed due to absence of dev tools in python. Yo can install them using the following command:

sudo apt-get install python-dev

 

MPI on Windows/MAC

For Windows/MAC user, they can visit the following link and download the .zip file and unzip and execute it:

MPI framework


Tutorials

Following installation, you can refer to the following documentation for using MPI using python.

https://mpi4py.scipy.org/docs/usrman/tutorial.html

References

https://www.open-mpi.org/

https://en.wikipedia.org/wiki/Message_Passing_Interface

About the Author: Anurag Mishra currently a 3rd year B.Tech student is an avid software follower and a full stack web developer. His keen interest lies in web development, NLP and networking.

If you also wish to showcase your blog here, please see GBlog for guest blog writing on GeeksforGeeks.



My Personal Notes arrow_drop_up


Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.