Open In App

Introduction to Parallel Programming with OpenMP in C++

Last Updated : 19 Mar, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Parallel programming is the process of breaking down a large task into smaller sub-tasks that can be executed simultaneously, thus utilizing the available computing resources more efficiently. OpenMP is a widely used API for parallel programming in C++. It allows developers to write parallel code easily and efficiently by adding simple compiler directives to their existing code.

Syntax of OpenMP

OpenMP uses compiler directives to indicate the parallel sections of the code. The directives are preceded by the “#pragma” keyword and take the form:

#pragma omp <directive> [clause[,clause]...]

Parameters

1. The following are some common OpenMP directives:

  • “parallel”: create a team of threads that execute the enclosed code block in parallel.
  • “for”: splits a loop into smaller iterations that can be executed in parallel by different threads.
  • “sections”: split the enclosed code block into different sections that can be executed in parallel.
  • “single”: specifies that a code block should be executed by only one thread.
  • “critical”: specifies that a code block should be executed by only one thread at a time.
  • “atomic”: specifies that a variable should be accessed atomically.

2. clause: The clauses provide additional information to the directives. For example, the “num_threads” clause specifies the number of threads to be used for a parallel section.

Steps for Parallel Programming

Steps needed to achieve (openMP) parallelize in your programming:

1. Include the OpenMP header file:

#include <omp.h>

2. Add the OpenMP directives to the relevant sections of your code.

#pragma omp parallel
{
   // Code block to be executed in parallel
}

Examples of Parallel Programming

Example 1: In this example, we define two functions, “sum_serial” and “sum_parallel”, that calculate the sum of the first n natural numbers using a for a loop. The “sum_serial” function uses a serial implementation, while the “sum_parallel” function uses OpenMP to parallelize the for loop. We then benchmark the two implementations by calling both functions with n=100000000 and measuring the time taken to complete the task using the high_resolution_clock class from the chrono library.

Below is the implementation of the above example:

C++




// C++ Program to implement calculate
// the sum of the first n natural numbers
// using Parallel Programming
#include <chrono>
#include <iostream>
  
int sum_serial(int n)
{
    int sum = 0;
    for (int i = 0; i <= n; ++i) {
        sum += i;
    }
    return sum;
}
  
// Parallel programming function
int sum_parallel(int n)
{
    int sum = 0;
#pragma omp parallel for reduction(+ : sum)
    for (int i = 0; i <= n; ++i) {
        sum += i;
    }
    return sum;
}
  
// Driver Function
int main()
{
    const int n = 100000000;
  
    auto start_time
        = std::chrono::high_resolution_clock::now();
  
    int result_serial = sum_serial(n);
  
    auto end_time
        = std::chrono::high_resolution_clock::now();
  
    std::chrono::duration<double> serial_duration
        = end_time - start_time;
  
    start_time = std::chrono::high_resolution_clock::now();
  
    int result_parallel = sum_parallel(n);
    end_time = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> parallel_duration
        = end_time - start_time;
  
    std::cout << "Serial result: " << result_serial
              << std::endl;
    std::cout << "Parallel result: " << result_parallel
              << std::endl;
    std::cout << "Serial duration: "
              << serial_duration.count() << " seconds"
              << std::endl;
    std::cout << "Parallel duration: "
              << parallel_duration.count() << " seconds"
              << std::endl;
    std::cout << "Speedup: "
              << serial_duration.count()
                     / parallel_duration.count()
              << std::endl;
  
    return 0;
}


Output

Serial result: 987459712
Parallel result: 987459712
Serial duration: 0.0942459 seconds
Parallel duration: 0.0658899 seconds
Speedup: 1.43035

Example 2:  In this example, we’re computing an approximation of pi using the following formula:

pi/4 = 1 - 1/3 + 1/5 - 1/7 + 1/9 - ...

Doing this by summing a large number of terms in the formula, and we’re using OpenMP to parallelize the for loop that does the summation.

The compute_pi_serial function implements the formula in serial, using a simple for loop to compute the sum. The compute_pi_parallel function is parallelized using OpenMP, with the #pragma omp parallel for reduction(+:sum) directive.

The main function runs both the serial and parallel versions of the code and measures the execution time of each version using the high_resolution_clock class from the chrono library. It also calculates the speedup achieved by the parallel version.

Below is the implementation of the above example:

C++




// C++ Program to implement
// Parallel Programming
#include <chrono>
#include <iostream>
#include <omp.h>
  
// Computes the value of pi using a serial computation.
double compute_pi_serial(long num_steps)
{
    double step = 1.0 / num_steps;
    double sum = 0.0;
    for (long i = 0; i < num_steps; i++) {
        double x = (i + 0.5) * step;
        sum += 4.0 / (1.0 + x * x);
    }
    return sum * step;
}
  
// Computes the value of pi using a parallel computation.
double compute_pi_parallel(long num_steps)
{
    double step = 1.0 / num_steps;
    double sum = 0.0;
    // parallelize loop and reduce sum variable
#pragma omp parallel for reduction(+ : sum)
    for (long i = 0; i < num_steps; i++) {
        double x = (i + 0.5) * step;
        sum += 4.0 / (1.0 + x * x);
    }
    return sum * step;
}
  
// Driver function
int main()
{
    const long num_steps = 1000000000L;
  
    // Compute pi using serial computation and time it.
    auto start_time
        = std::chrono::high_resolution_clock::now();
    double pi_serial = compute_pi_serial(num_steps);
    auto end_time
        = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> serial_duration
        = end_time - start_time;
  
    // Compute pi using parallel computation and time it.
    start_time = std::chrono::high_resolution_clock::now();
    double pi_parallel = compute_pi_parallel(num_steps);
    end_time = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> parallel_duration
        = end_time - start_time;
  
    std::cout << "Serial result: " << pi_serial
              << std::endl;
    std::cout << "Parallel result: " << pi_parallel
              << std::endl;
    std::cout << "Serial duration: "
              << serial_duration.count() << " seconds"
              << std::endl;
    std::cout << "Parallel duration: "
              << parallel_duration.count() << " seconds"
              << std::endl;
    std::cout << "Speedup: "
              << serial_duration.count()
                     / parallel_duration.count()
              << std::endl;
  
    return 0;
}


Output

Serial result: 3.14159
Parallel result: 3.14159
Serial duration: 1.64776 seconds
Parallel duration: 1.51894 seconds
Speedup: 1.08481


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads