NLP | Parallel list processing with execnet

Last Updated : 12 Jun, 2019

This article presents a pattern for using execnet to process a list in parallel. It’s a function pattern for mapping each element in the list to a new value, using execnet to do the mapping in parallel.

In the code given below, integers are simply doubled, any pure computation can be performed. Given is the module, which will be executed by execnet. It receives a 2-tuple of (i, arg), assumes arg is a number and sends back (i, arg*2).

Code :

if __name__ == '__channelexec__': 
    for (i, arg) in channel: 
        channel.send((i, arg * 2)) 

To use this module to double every element in a list, import the plists module and call plists.map() with the remote_double module, and a list of integers to double.

Code : Using plist

import plists, remote_double 
plists.map(remote_double, range(10)) 

Output :

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

The map() function is defined in plists.py. It takes a pure module, a list of arguments, and an optional list of 2-tuples consisting of (spec, count). The default specs are [(‘popen’, 2)], which means the user will open two local gateways and channels. Once these channels are opened, the user can put them into an itertools cycle, which creates an infinite iterator that cycles back to the beginning once it hits the end.

Now, each argument can be sent in args to a channel for processing, and since the channels are cycled, each channel gets an almost even distribution of arguments. This is where i comes in — the order in which the results come back is unknown, so i, as the index of each arg in the list, is passed to the channel and back so the user can combine the results in the original order. Then wait for the results with a MultiChannel receive queue and insert them into a prefilled list that’s the same length as the original args. After having all the expected results, exit the gateways and return the results as shown in the code given below –

Code :

import itertools, execnet 
def map(mod, args, specs =[('popen', 2)]): 
    gateways = [] 
    channels = [] 
      
    for spec, count in specs: 
        for i in range(count): 
            gw = execnet.makegateway(spec) 
            gateways.append(gw) 
            channels.append(gw.remote_exec(mod)) 
              
    cyc = itertools.cycle(channels) 
      
    for i, arg in enumerate(args): 
        channel = next(cyc) 
        channel.send((i, arg)) 
    mch = execnet.MultiChannel(channels) 
    queue = mch.make_receive_queue() 
    l = len(args) 
    # creates a list of length l,  
    # where every element is None 
    results = [None] * l  
      
    for i in range(l): 
        channel, (i, result) = queue.get() 
        results[i] = result 
          
    for gw in gateways: 
        gw.exit() 
    return results 

Code : Increasing the parallelization by modifying the specs

plists.map(remote_double, range(10), [('popen', 4)])

Output :

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

However, more parallelization does not necessarily mean faster processing. It depends on the available resources, and the more gateways and channels being opened, the more overhead is required. Ideally, there should be one gateway and channel per CPU core to get maximum resource utilization. Use plists.map() with any pure module as long as it receives and sends back 2-tuples where i is the first element. This pattern is most useful when a bunch of numbers to crunch are present to be processed as quickly as possible.

Suggest improvement

Difference Between Implicit Parallelism and Explicit Parallelism in Parallel Computing

Python | sympy.Matrix.row() method

Share your thoughts in the comments

NLP | Parallel list processing with execnet

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?