NLP | Parallel list processing with execnet

This article presents a pattern for using execnet to process a list in parallel. It’s a function pattern for mapping each element in the list to a new value, using execnet to do the mapping in parallel.

In the code given below, integers are simply doubled, any pure computation can be performed. Given is the module, which will be executed by execnet. It receives a 2-tuple of (i, arg), assumes arg is a number and sends back (i, arg*2).

Code :



filter_none

edit
close

play_arrow

link
brightness_4
code

if __name__ == '__channelexec__':
    for (i, arg) in channel:
        channel.send((i, arg * 2))

chevron_right


To use this module to double every element in a list, import the plists module and call plists.map() with the remote_double module, and a list of integers to double.

Code : Using plist

filter_none

edit
close

play_arrow

link
brightness_4
code

import plists, remote_double
plists.map(remote_double, range(10))

chevron_right


Output :

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

The map() function is defined in plists.py. It takes a pure module, a list of arguments, and an optional list of 2-tuples consisting of (spec, count). The default specs are [(‘popen’, 2)], which means the user will open two local gateways and channels. Once these channels are opened, the user can put them into an itertools cycle, which creates an infinite iterator that cycles back to the beginning once it hits the end.

Now, each argument can be sent in args to a channel for processing, and since the channels are cycled, each channel gets an almost even distribution of arguments. This is where i comes in — the order in which the results come back is unknown, so i, as the index of each arg in the list, is passed to the channel and back so the user can combine the results in the original order. Then wait for the results with a MultiChannel receive queue and insert them into a prefilled list that’s the same length as the original args. After having all the expected results, exit the gateways and return the results as shown in the code given below –

Code :

filter_none

edit
close

play_arrow

link
brightness_4
code

import itertools, execnet
def map(mod, args, specs =[('popen', 2)]):
    gateways = []
    channels = []
      
    for spec, count in specs:
        for i in range(count):
            gw = execnet.makegateway(spec)
            gateways.append(gw)
            channels.append(gw.remote_exec(mod))
              
    cyc = itertools.cycle(channels)
      
    for i, arg in enumerate(args):
        channel = next(cyc)
        channel.send((i, arg))
    mch = execnet.MultiChannel(channels)
    queue = mch.make_receive_queue()
    l = len(args)
    # creates a list of length l, 
    # where every element is None
    results = [None] *
      
    for i in range(l):
        channel, (i, result) = queue.get()
        results[i] = result
          
    for gw in gateways:
        gw.exit()
    return results

chevron_right


Code : Increasing the parallelization by modifying the specs

filter_none

edit
close

play_arrow

link
brightness_4
code

plists.map(remote_double, range(10), [('popen', 4)])

chevron_right


Output :

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

However, more parallelization does not necessarily mean faster processing. It depends on the available resources, and the more gateways and channels being opened, the more overhead is required. Ideally, there should be one gateway and channel per CPU core to get maximum resource utilization. Use plists.map() with any pure module as long as it receives and sends back 2-tuples where i is the first element. This pattern is most useful when a bunch of numbers to crunch are present to be processed as quickly as possible.



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.