Running Python script on GPU.

Last Updated : 08 Mar, 2024

GPU’s have more cores than CPU and hence when it comes to parallel computing of data, GPUs perform exceptionally better than CPUs even though GPUs has lower clock speed and it lacks several core management features as compared to the CPU.

Thus, running a python script on GPU can prove to be comparatively faster than CPU, however, it must be noted that for processing a data set with GPU, the data will first be transferred to the GPU’s memory which may require additional time so if data set is small then CPU may perform better than GPU.

Getting started:

Only NVIDIA GPUs are supported for now and the ones which are listed on this page. If your graphics card has CUDA cores, then you can proceed further with setting up things.

Installation:

First, make sure that Nvidia drivers are upto date also you can install cudatoolkit explicitly from here. then install Anaconda add anaconda to the environment while installing.
After completion of all the installations run the following commands in the command prompt.

conda install numba & conda install cudatoolkit

NOTE: If Anaconda is not added to the environment then navigate to anaconda installation and locate the Scripts directory and open the command prompt there.

CODE :

We will use the numba.jit decorator for the function we want to compute over the GPU. The decorator has several parameters but we will work with only the target parameter. Target tells the jit to compile codes for which source(“CPU” or “Cuda”). “Cuda” corresponds to GPU. However, if the

CPU is passed as an argument then the jit tries to optimize the code run faster on CPU and improves the speed too.

Python3

from numba import jit, cuda 
import numpy as np 
# to measure exec time 
from timeit import default_timer as timer    
  
# normal function to run on cpu 
def func(a):                                 
    for i in range(10000000): 
        a[i]+= 1      
  
# function optimized to run on gpu  
@jit(target_backend='cuda')                          
def func2(a): 
    for i in range(10000000): 
        a[i]+= 1
if __name__=="__main__": 
    n = 10000000                            
    a = np.ones(n, dtype = np.float64) 
      
    start = timer() 
    func(a) 
    print("without GPU:", timer()-start)     
      
    start = timer() 
    func2(a) 
    print("with GPU:", timer()-start) 

Output: based on CPU = i3 6006u, GPU = 920M.

without GPU: 8.985259440999926
with GPU: 1.4247172560001218

However, it must be noted that the array is first copied from ram to the GPU for processing and if the function returns anything then the returned values will be copied from GPU to CPU back. Therefore for small data sets the speed of the CPU is comparatively faster but the speed can be further improved even for small data sets by passing target as “CPU”. Special care should be taken when the function which is written under the jit attempts to call any other function then that function should also be optimized with jit else the jit may produce even slower codes.

Suggest improvement

How to Run a Python Script using Docker?

Share your thoughts in the comments