This article presents a pattern for using execnet to process a list in parallel. It’s a function pattern for mapping each element in the list to a new value, using execnet to do the mapping in parallel.
In the code given below, integers are simply doubled, any pure computation can be performed. Given is the module, which will be executed by execnet. It receives a 2-tuple of (i, arg), assumes arg is a number and sends back (i, arg*2).
To use this module to double every element in a list, import the plists module and call plists.map() with the remote_double module, and a list of integers to double.
Code : Using plist
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
The map() function is defined in plists.py. It takes a pure module, a list of arguments, and an optional list of 2-tuples consisting of (spec, count). The default specs are [(‘popen’, 2)], which means the user will open two local gateways and channels. Once these channels are opened, the user can put them into an itertools cycle, which creates an infinite iterator that cycles back to the beginning once it hits the end.
Now, each argument can be sent in args to a channel for processing, and since the channels are cycled, each channel gets an almost even distribution of arguments. This is where i comes in — the order in which the results come back is unknown, so i, as the index of each arg in the list, is passed to the channel and back so the user can combine the results in the original order. Then wait for the results with a MultiChannel receive queue and insert them into a prefilled list that’s the same length as the original args. After having all the expected results, exit the gateways and return the results as shown in the code given below –
Code : Increasing the parallelization by modifying the specs
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
However, more parallelization does not necessarily mean faster processing. It depends on the available resources, and the more gateways and channels being opened, the more overhead is required. Ideally, there should be one gateway and channel per CPU core to get maximum resource utilization. Use plists.map() with any pure module as long as it receives and sends back 2-tuples where i is the first element. This pattern is most useful when a bunch of numbers to crunch are present to be processed as quickly as possible.
- NLP | Distributed Tagging with Execnet - Part 1
- NLP | Distributed Tagging with Execnet - Part 2
- NLP | Distributed chunking with Execnet
- NLP | How to score words with Execnet and Redis
- Processing text using NLP | Basics
- Parallel Processing in Python
- Point Processing in Image Processing using Python-OpenCV
- TensorFlow - How to stack a list of rank-R tensors into one rank-(R+1) tensor in parallel
- NLP | Classifier-based Chunking | Set 2
- Readability Index in Python(NLP)
- Feature Extraction Techniques - NLP
- Python | NLP analysis of Restaurant reviews
- Applying Multinomial Naive Bayes to NLP Problems
- NLP | Chunking and chinking with RegEx
- NLP | Training Unigram Tagger
- NLP | Synsets for a word in WordNet
- NLP | Part of Speech - Default Tagging
- NLP | Word Collocations
- NLP | WuPalmer - WordNet Similarity
- NLP | Training a tokenizer and filtering stopwords in a sentence
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.