How To Run CUDA C/C++ on Jupyter notebook in Google Colaboratory


What is CUDA?
CUDA is a model created by Nvidia for parallel computing platform and application programming interface. CUDA is the parallel computing architecture of NVIDIA which allows for dramatic increases in computing performance by harnessing the power of the GPU.

What is Google Colab?
Google Colab is a free cloud service and the most important feature able to distinguish Colab from other free cloud services is; Colab offers GPU and is completely free! With Colab you can work on the GPU with CUDA C/C++ for free!

CUDA code will not run on AMD CPU or Intel HD graphics unless you have NVIDIA hardware inside your machine.On Colab you can take advantage of Nvidia GPU as well as being a fully functional Jupyter Notebook with pre-installed Tensorflow and some other ML/DL tools.

I guess, I’m done with the introduction. Let’s configure our learning environment.

Step 1: Go to https://colab.research.google.com in Browser and Click on New Notebook.

Create new notebook

Step 2: We need to switch our runtime from CPU to GPU. Click on Runtime > Change runtime type > Hardware Accelerator > GPU > Save.

change runtime



gpu

Step 3: Completely uninstall any previous CUDA versions.We need to refresh the Cloud Instance of CUDA.

!apt-get --purge remove cuda nvidia* libnvidia-*
!dpkg -l | grep cuda- | awk '{print $2}' | xargs -n1 dpkg --purge
!apt-get remove cuda-*
!apt autoremove
!apt-get update

Write code in a separate code Block and Run that code.Every line that starts with ‘!’, it will be executed as a command line command.

Step 4: Install CUDA Version 9 (You can just copy it in separate code block).

!wget https://developer.nvidia.com/compute/cuda/9.2/Prod/local_installers/cuda-repo-ubuntu1604-9-2-local_9.2.88-1_amd64 -O cuda-repo-ubuntu1604-9-2-local_9.2.88-1_amd64.deb
!dpkg -i cuda-repo-ubuntu1604-9-2-local_9.2.88-1_amd64.deb
!apt-key add /var/cuda-repo-9-2-local/7fa2af80.pub
!apt-get update
!apt-get install cuda-9.2

Step 5: Now you can check your CUDA installation by running the command given below :

!nvcc --version

Output will be something like this:

vcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Wed_Apr_11_23:16:29_CDT_2018
Cuda compilation tools, release 9.2, V9.2.88

Step 6: Run the given command to install a small extension to run nvcc from the Notebook cells.

!pip install git+git://github.com/andreinechaev/nvcc4jupyter.git

Step 7: Load the extension using the code given below:

%load_ext nvcc_plugin

Step 8: Execute the code given below to check if CUDA is working or not.

Now we are ready to run CUDA C/C++ code right in your Notebook.

Important Note: To check the following code is working or not, write that code in a separate code block and Run that only again when you update the code and re running it.

To run the code in your notebook, add the %%cu extension at the beginning of your code.

filter_none

edit
close

play_arrow

link
brightness_4
code

% % cu
#include <iostream>
    int
    main()
{
    std::cout << "Welcome To GeeksforGeeks\n";
    return 0;
}

chevron_right


Output:

Welcome To GeeksforGeeks

I suggests you to try program of find maximum element from vector to check that everything works properly.

filter_none

edit
close

play_arrow

link
brightness_4
code

% % cu
#include <cstdio>
#include <iostream>
  
    using namespace std;
  
__global__ void maxi(int* a, int* b, int n)
{
    int block = 256 * blockIdx.x;
    int max = 0;
  
    for (int i = block; i < min(256 + block, n); i++) {
  
        if (max < a[i]) {
            max = a[i];
        }
    }
    b[blockIdx.x] = max;
}
  
int main()
{
  
    int n;
    n = 3 >> 2;
    int a[n];
  
    for (int i = 0; i < n; i++) {
        a[i] = rand() % n;
        cout << a[i] << "\t";
    }
  
    cudaEvent_t start, end;
    int *ad, *bd;
    int size = n * sizeof(int);
    cudaMalloc(&ad, size);
    cudaMemcpy(ad, a, size, cudaMemcpyHostToDevice);
    int grids = ceil(n * 1.0f / 256.0f);
    cudaMalloc(&bd, grids * sizeof(int));
  
    dim3 grid(grids, 1);
    dim3 block(1, 1);
  
    cudaEventCreate(&start);
    cudaEventCreate(&end);
    cudaEventRecord(start);
  
    while (n > 1) {
        maxi<<<grids, block> > >(ad, bd, n);
        n = ceil(n * 1.0f / 256.0f);
        cudaMemcpy(ad, bd, n * sizeof(int), cudaMemcpyDeviceToDevice);
    }
  
    cudaEventRecord(end);
    cudaEventSynchronize(end);
  
    float time = 0;
    cudaEventElapsedTime(&time, start, end);
  
    int ans[2];
    cudaMemcpy(ans, ad, 4, cudaMemcpyDeviceToHost);
  
    cout << "The maximum element is : " << ans[0] << endl;
  
    cout << "The time required : ";
    cout << time << endl;
}

chevron_right


Output:

The maximum element is : 1338278816
The time required : 0.003392

I hope it will be helpful for someone.

Attention reader! Don’t stop learning now. Get hold of all the important DSA concepts with the DSA Self Paced Course at a student-friendly price and become industry ready.




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :
Practice Tags :


1


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.