Speed up Code executions with help of Pragma in C/C++
The primary goal of a compiler is to reduce the cost of compilation and to make debugging produce the expected results. Not all optimizations are controlled directly by a flag, sometimes we need to explicitly declare flags to produce optimizations. By default optimizations are suppressed. To use suppressed optimizations we will use pragmas.
Example for unoptimized program: Let us consider an example to calculate Prime Numbers up to 10000000.
Below is the code with no optimization:
C++
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
vector< bool > prime(N, true );
void sieveOfEratosthenes()
{
for ( int i = 2; i <= sqrt (N); ++i) {
if (prime[i]) {
for ( int j = i * i; j <= N; j += i) {
prime[j] = false ;
}
}
}
}
int main()
{
clock_t start, end;
start = clock ();
sieveOfEratosthenes();
end = clock ();
double time_taken
= double (end - start)
/ double (CLOCKS_PER_SEC);
cout << "Execution time: " << time_taken
<< " secs" ;
return 0;
}
|
Output:
Execution time: 0.592183 secs
Following are the Optimization:
1. O1: Optimizing compilation at O1 includes more time and memory to break down larger functions. The compiler makes an attempt to reduce both code and execution time. At O1 hardly any optimizations produce great results, but O1 is a setback for an attempt for better optimizations.
Below is the implementation of previous program with O1 optimization:
C++
#pragma GCC optimize("O1")
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
vector< bool > prime(N, true );
void sieveOfEratosthenes()
{
for ( int i = 2; i <= sqrt (N); ++i) {
if (prime[i]) {
for ( int j = i * i; j <= N; j += i) {
prime[j] = false ;
}
}
}
}
int main()
{
clock_t start, end;
start = clock ();
sieveOfEratosthenes();
end = clock ();
double time_taken
= double (end - start)
/ double (CLOCKS_PER_SEC);
cout << "Execution time: " << time_taken
<< " secs." ;
return 0;
}
|
Output:
Execution time: 0.384945 secs.
2. O2: Optimizing compilation at O2 optimize to a greater extent. As compared to O1, this option increases both compilation time and the performance of the generated code. O2 turns on all optimization flags specified by O1.
Below is the implementation of previous program with O2 optimization:
C++
#pragma GCC optimize("O2")
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
vector< bool > prime(N, true );
void sieveOfEratosthenes()
{
for ( int i = 2; i <= sqrt (N); ++i) {
if (prime[i]) {
for ( int j = i * i; j <= N; j += i) {
prime[j] = false ;
}
}
}
}
int main()
{
clock_t start, end;
start = clock ();
sieveOfEratosthenes();
end = clock ();
double time_taken
= double (end - start)
/ double (CLOCKS_PER_SEC);
cout << "Execution time: " << time_taken
<< " secs." ;
return 0;
}
|
Output:
Execution time: 0.288337 secs.
3. O3: All the optimizations at level O2 are specified by O3 and a list of other flags are also enabled. Few of the flags which are included in O3 are flop-interchange -flop-unroll-jam and -fpeel-loops.
Below is the implementation of previous program with O3 optimization:
C++
#pragma GCC optimize("O3")
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
vector< bool > prime(N, true );
void sieveOfEratosthenes()
{
for ( int i = 2; i <= sqrt (N); ++i) {
if (prime[i]) {
for ( int j = i * i; j <= N; j += i) {
prime[j] = false ;
}
}
}
}
int main()
{
clock_t start, end;
start = clock ();
sieveOfEratosthenes();
end = clock ();
double time_taken
= double (end - start)
/ double (CLOCKS_PER_SEC);
cout << "Execution time: " << time_taken
<< " secs." ;
return 0;
}
|
Output:
Execution time: 0.580154 secs.
4. Os: It is optimize for size. Os enables all O2 optimizations except the ones that have increased code size. It also enables -finline-functions, causes the compiler to tune for code size rather than execution speed and performs further optimizations designed to reduce code size.
Below is the implementation of previous program with Os optimization:
C++
#pragma GCC optimize("Os")
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
vector< bool > prime(N, true );
void sieveOfEratosthenes()
{
for ( int i = 2; i <= sqrt (N); ++i) {
if (prime[i]) {
for ( int j = i * i; j <= N; j += i) {
prime[j] = false ;
}
}
}
}
int main()
{
clock_t start, end;
start = clock ();
sieveOfEratosthenes();
end = clock ();
double time_taken
= double (end - start)
/ double (CLOCKS_PER_SEC);
cout << "Execution time: " << time_taken
<< " secs." ;
return 0;
}
|
Output:
Execution time: 0.317845 secs.
5. Ofast: Ofast enables all O3 optimizations. It also has the number of enabled flags that produce super optimized results. Ofast combines optimizations produced by each of the above O levels. This optimization is usually preferred by a lot of competitive programmers and is hence recommended. In case more than one optimizations are declared the last declared one gets enabled.
Below is the implementation of previous program with Ofast optimization:
C++
#pragma GCC optimize("Ofast")
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
vector< bool > prime(N, true );
void sieveOfEratosthenes()
{
for ( int i = 2; i <= sqrt (N); ++i) {
if (prime[i]) {
for ( int j = i * i; j <= N; j += i) {
prime[j] = false ;
}
}
}
}
int main()
{
clock_t start, end;
start = clock ();
sieveOfEratosthenes();
end = clock ();
double time_taken
= double (end - start)
/ double (CLOCKS_PER_SEC);
cout << "Execution time: " << time_taken
<< " secs." ;
return 0;
}
|
Output:
Execution time: 0.303287 secs.
To further achieve optimizations at architecture level we can use targets with pragmas. These optimizations can produce surprising results. However it is recommended to use target with any of the optimizations specified above.
Below is the implementation of previous program with Target:
C++14
#pragma GCC optimize("Ofast")
#pragma GCC target("avx,avx2,fma")
#include <cmath>
#include <iostream>
#include <vector>
#define N 10000005
using namespace std;
vector< bool > prime(N, true );
void sieveOfEratosthenes()
{
for ( int i = 2; i <= sqrt (N); ++i) {
if (prime[i]) {
for ( int j = i * i; j <= N; j += i) {
prime[j] = false ;
}
}
}
}
int main()
{
clock_t start, end;
start = clock ();
sieveOfEratosthenes();
end = clock ();
double time_taken
= double (end - start)
/ double (CLOCKS_PER_SEC);
cout << "Execution time: " << time_taken
<< " secs." ;
return 0;
}
|
Output:
Execution time: 0.292147 secs.
Last Updated :
21 Sep, 2021
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...