NLP | Distributed Tagging with Execnet – Part 2

The gateway’s remote_exec() method takes a single argument that can be one of the following three types:

  • A string of code to execute remotely
  • The name of a pure function that will be serialized and executed remotely
  • The name of a pure module whose source will be executed remotely

Code : Using the remote_tag.py module with three options

filter_none

edit
close

play_arrow

link
brightness_4
code

import pickle
  
if __name__ == '__channelexec__':
    tagger = pickle.loads(channel.receive())
for sentence in channel:
    channel.send(tagger.tag(sentence))

chevron_right


What is Pure Module?

  • A pure module is a module that is self-contained: it can only access Python modules that are available where it executes and does not have access to any variables or states that exist wherever the gateway is initially created.
  • Similarly, a pure function is a self-contained function, with no external dependencies.
  • To detect that the module is being executed by execnet, check the __name__ variable. If it’s equal to ‘__channelexec__’, then it is being used to create a remote channel.
  • This is similar to doing if __name__ == ‘__main__’ to check if a module is being executed on the
    command line.
  • The first thing to do is calling channel.receive() to get the serialized tagger, which is loaded using pickle.loads()
  • It is noticed that channel is not imported anywhere—that’s because it is included in the global namespace of the module. Any module that execnet executes remotely has access to the channel variable in order to communicate with the channel creator.
  • After having the tagger, tag() each tokenized sentence iteratively, that is received from the channel.
  • This allows the user to tag as many sentences as the sender wants to send, as iteration will not stop until the channel is closed.
  • So, a compute node for part-of-speech tagging is created that dedicates 100% of its resources to tagging whatever sentences it receives. As long as the channel remains open, the node is available for processing.

Execnet can do a lot more, such as opening multiple channels to increase parallel processing, as well as opening gateways to remote hosts over SSH to do distributed processing.

Creating multiple channels
Multiple channels are created, one per gateway, to make the processing more parallel. Each gateway creates a new subprocess (or remote interpreter if using an SSH gateway), and one channel per gateway for communication is used. Once two channels are created, they can be combined using the MultiChannel class, which allows the user to iterate over the channels and make a receive queue to receive messages from each channel.
After creating each channel and sending the tagger, the channels are cycled through to send an even number of sentences to each channel for tagging. Then, all the responses are collected from the queue. A call to queue.get() will return a 2-tuple of (channel, message) in case it is required to know which channel the message came from. Once all the tagged sentences have been collected, gateways can be exit easily.

Code :

filter_none

edit
close

play_arrow

link
brightness_4
code

import itertools
  
gw1 = execnet.makegateway()
gw2 = execnet.makegateway()
  
ch1 = gw1.remote_exec(remote_tag)
ch1.send(pickled_tagger)
ch2 = gw2.remote_exec(remote_tag)
ch2.send(pickled_tagger)
  
mch = execnet.MultiChannel([ch1, ch2])
queue = mch.make_receive_queue()
channels = itertools.cycle(mch)
  
for sentence in treebank.sents()[:4]:
    channel = next(channels)
    channel.send(sentence)
tagged_sentences = []
  
for i in range(4):
    channel, tagged_sentence = queue.get()
    tagged_sentences.append(tagged_sentence)
      
print ("Length : ", len(tagged_sentences))
  
gw1.exit()
gw2.exit()

chevron_right


Output :

Length : 4

In the example code, only four sentences are sent, but in real life, one needs to send thousands. A single computer can tag four sentences very quickly, but when thousands or hundreds of thousands of sentences need to be tagged, sending sentences to multiple computers can be much faster than waiting for a single computer to do it all.



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.