Parallel Programming with NumPy and SciPy

Last Updated : 05 Jun, 2023

Parallel computing is a type of computation in which many calculations or the execution of processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time.

Required Modules:

pip install scipy
pip install numpy
pip install cupy

Parallel Programming with NumPy

NumPy is a popular numeric computation library for Python known for its efficient array operations and support for vectorized operations. One way to further optimize NumPy code is to use parallel programming techniques, which take advantage of multiple CPU cores to perform calculations faster.

Parallel dot product calculation using NumPy

First, we have to import the Numpy using Import Numpy as np. Then we have to Create two random vectors a and b of length 100000 and calculate the dot product using NumPy’s built-in parallelization, i.e. np.dot(a,b). Finally, Print the result.

Python3

import numpy as np 
  
# Create two random vectors of length 100000 
a = np.random.rand(100000) 
b = np.random.rand(100000) 
  
# Calculate the dot product using NumPy's built-in parallelization 
dot_product = np.dot(a, b) 
  
print(dot_product) 

Output: 25016.0204799

Parallel matrix multiplication using NumPy and Multiprocessing

First, we have to import the Numpy using Import Numpy as np. Then we imported the multiprocessing using the Import pool from multiprocessing. We have defined matrix multiplication as matrix_multiply(args). we have created two random matrices A and B of size 1000×1000 further we Split the matrices into four parts and created a multiprocessing pool with four workers. Then we have to Map the matrix multiplication function to the four parts of the matrices. Concatenate the parts of the result matrix.

Python3

import numpy as np 
from multiprocessing import Pool 
  
# Define the matrix multiplication function 
def matrix_multiply(args): 
    A, B = args 
    return np.dot(A, B) 
  
# Create two random matrices of size 1000x1000 
A = np.random.rand(1000, 1000) 
B = np.random.rand(1000, 1000) 
  
# Split the matrices into 4 parts 
A_parts = np.array_split(A, 4, axis=1) 
B_parts = np.array_split(B, 4) 
  
# Create a multiprocessing pool with 4 workers 
pool = Pool(4) 
  
# Map the matrix multiplication function to the 4 parts of the matrices 
C_parts = pool.map(matrix_multiply,  
      [(A_part, B_part) for A_part, B_part in zip(A_parts, B_parts)]) 
  
# Concatenate the parts of the result matrix 
C = np.concatenate(C_parts, axis=1) 
  
print(C) 

Output:

 [[ 246.26109895  245.27979434  247.53272716 ...,  246.54602696   246.56427344  247.98649696]
         [ 250.3441429   249.08795621  250.72384067 ...,  250.04416057   250.39319075  251.28326167]
         [ 248.44163838  247.48820248  249.19031327 ...,  248.48692097   249.24465987  250.2703185 ]
         ..., 
         [ 252.35223132  250.92852728  251.9176228  ...,  251.5751485   253.00980032  252.06391074]
         [ 251.8001927   249.67594552  250.62393445 ...,  249.82225854   252.16903134  251.53323254]
         [ 252.24630379  251.09158312  251.64857194 ...,  251.07993262   252.88783961  252.44037699]]

GPU Computing using NumPy

However, NumPy or SciPy alone cannot perform GPU computing. For this, we need some other libraries such as CuPy or PyTorch in addition to NumPy or SciPy to perform GPU computing. At first, we imported the Cupy and Numpy using Import Cupy as cp and Numpy as np. Then we created a random array on GPU, we performed element-wise squaring on the GPU. Then, Transfer the result back to the CPU as a NumPy array. Then we performed further computations on the CPU using NumPy.

Python3

import cupy as cp 
import numpy as np 
  
# Create a random array on the GPU 
a_gpu = cp.random.rand(3, 3) 
  
# Perform element-wise squaring on the GPU 
a_squared_gpu = cp.square(a_gpu) 
  
# Transfer the result back to the CPU as a NumPy array 
a_squared_cpu = cp.asnumpy(a_squared_gpu) 
  
# Perform further computations on the CPU using NumPy 
a_sum = np.sum(a_squared_cpu) 
  
print("Original array on GPU:") 
print(a_gpu) 
  
print("Squared array on GPU:") 
print(a_squared_gpu) 
  
print("Squared array on CPU (as NumPy array):") 
print(a_squared_cpu) 
  
print("Sum of squared array on CPU (computed using NumPy):") 
print(a_sum) 

Output:

 Original array on GPU:
        [[0.86840887 0.46334445 0.07575684]
         [0.95068822 0.27356767 0.04985629]
         [0.46676109 0.92671615 0.43278567]]
        Squared array on GPU:
        [[7.53992218e-01 2.14645385e-01 5.71954152e-03]
         [9.04904051e-01 7.48442324e-02 2.48564982e-03]
         [2.17650008e-01 8.58339857e-01 1.87155546e-01]]
        Squared array on CPU (as NumPy array):
        [[7.53992218e-01 2.14645385e-01 5.71954152e-03]
         [9.04904051e-01 7.48442324e-02 2.48564982e-03]
         [2.17650008e-01 8.58339857e-01 1.87155546e-01]]
        Sum of squared array on CPU (computed using NumPy):
        3.628021118080383

Multi-threading using NumPy

We need to import the necessary modules – NumPy and ThreadPoolExecutor from concurrent. futures. Next, we’ll define a function func(x) that we want to execute in parallel. We’ll create an input array arr of values that we want to apply this function to. We’ll also define the number of threads we want to use for parallel execution.

Now, we’ll create a ThreadPoolExecutor with the specified number of threads. This executor will allow us to run the function func on the input array arr in parallel. We’ll use the map method of the executor to apply func to each element of arr in parallel. The map method returns an iterator that contains the results of applying func to each element of arr. To get the actual results, we’ll need to convert the iterator to a NumPy array. We’ll do this by calling the list function on the iterator to get a list of the results and then converting that list to a NumPy array.

Python3

import numpy as np 
from concurrent.futures import ThreadPoolExecutor 
  
# Define a function to be executed in parallel 
def func(x): 
    return x**2
  
# Create an array of values 
arr = np.arange(10) 
  
# Define the number of threads to use 
num_threads = 4
  
# Create a ThreadPoolExecutor with the specified number of threads 
with ThreadPoolExecutor(max_workers=num_threads) as executor: 
    # Use the executor to map the function to the array in parallel 
    results = executor.map(func, arr) 
  
# Convert the results to a NumPy array 
results = np.array(list(results)) 
  
# Print the input array and the corresponding results 
print("Input Array: ", arr) 
print("Results: ", results) 

Output:

Finally, we’ll print both the input array arr and the corresponding results array.

Input Array:  [0 1 2 3 4 5 6 7 8 9]
Results:  [ 0  1  4  9 16 25 36 49 64 81]

Parallel Programming with SciPy

SciPy is a popular Python library for scientific and mathematical calculations. It provides many powerful tools for data analysis and signal processing optimization. In such cases, you can use external libraries and tools to run concurrently in SciPy

Parallelizing a simple map or reducing operation using SciPy’s ‘dask‘ module

At first, we imported the Numpy and Dask using Import Numpy as np and Import dask.array as da. Then we created a random array of size 10000 and chunks of size 1000 and assign it to the x. Then we created an operation and assign it to y.

Print the result.

Python3

import numpy as np 
import dask.array as da 
  
x = da.random.normal(size=(10000, 10000), chunks=(1000, 1000)) 
y = (x + x.T) - x.mean(axis=0) 
  
result = y.sum() 
  
print(result.compute()) 

Output: -10909.686111782875

Parallelizing a numerical integration using SciPy’s ‘quad‘ function and the ‘multiprocessing’ module

At first, we imported SciPy and multiprocessing using Import integrate form SciPy and Import multiprocessing. We have created a function using F(x) which calculates the square of the number. Then we created the pool worker and assign it to the result.

Python3

from scipy import integrate 
import multiprocessing 
  
def f(x): 
    return x**2
  
pool = multiprocessing.Pool(processes=4) 
  
result = integrate.quad(f, 0, 1) 
  
print(result[0]) 

Output: 0.33333333333333337

Suggest improvement

Image Processing with SciPy and NumPy in Python

Share your thoughts in the comments

Parallel Programming with NumPy and SciPy

Parallel Programming with NumPy

Parallel dot product calculation using NumPy

Python3

Parallel matrix multiplication using NumPy and Multiprocessing

Python3

GPU Computing using NumPy

Python3

Multi-threading using NumPy

Python3

Parallel Programming with SciPy

Parallelizing a simple map or reducing operation using SciPy’s ‘dask‘ module

Python3

Parallelizing a numerical integration using SciPy’s ‘quad‘ function and the ‘multiprocessing’ module

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?