NLP | Distributed chunking with Execnet

The article aims to perform chunking and tagging over an execnet gateway. Here two objects will be sent instead of one, and a Tree is received, which requires pickling and unpickling for serialization.

How it works ?

  • Use a pickled tagger.
  • First, pickle the default chunker used by nltk.chunk.ne_chunk(), though any chunker would do.
  • Next, make a gateway for the remote_chunk module, get a channel, and send the pickled tagger and chunker over.
  • Then, receive a pickled Tree, which can be unpickled and inspected to see the result. Finally, exit the gateway:

Code : Explaining the working

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing libraries
import execnet, remote_chunk
import nltk.data, nltk.tag, nltk.chunk
import pickle
from nltk.corpus import treebank_chunk
  
tagger = pickle.dumps(nltk.data.load(nltk.tag._POS_TAGGER))
chunker = pickle.dumps(
        nltk.data.load(nltk.chunk._MULTICLASS_NE_CHUNKER))
gw = execnet.makegateway()
  
channel = gw.remote_exec(remote_chunk)
channel.send(tagger)
channel.send(chunker)
channel.send(treebank_chunk.sents()[0])
  
chunk_tree = pickle.loads(channel.receive())
  
print (chunk_tree)
gw.exit()

chevron_right


Output :

Tree('S', [Tree('PERSON', [('Pierre', 'NNP')]), Tree('ORGANIZATION',
[('Vinken', 'NNP')]), (', ', ', '), ('61', 'CD'), ('years', 'NNS'),
('old', 'JJ'), (', ', ', '), ('will', 'MD'), ('join', 'VB'), ('the',
'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive',
'JJ'), ('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')]) 

Communication this time is slightly different as in figure given below –

  • The remote_chunk.py module is just a little bit more complicated than the remote_tag.py module.
  • In addition to receiving a pickled tagger, it also expects to receive a pickled chunker that implements the ChunkerIinterface
  • Once it has both a tagger and a chunker, it expects to receive any number of tokenized sentences, which it tags and parses into a Tree. This Tree is then pickled and sent back over the channel:

Code : Explaining the above working

filter_none

edit
close

play_arrow

link
brightness_4
code

import pickle
  
if __name__ == '__channelexec__':
    tagger = pickle.loads(channel.receive())
    chunker = pickle.loads(channel.receive())
  
for sentence in channel:
    chunk_tree = chunker.parse(tagger.tag(sent))
    channel.send(pickle.dumps(chunk_tree))

chevron_right


remote_chunk module’s only external dependency is the pickle module, which is part of the Python standard library. It doesn’t need to import any NLTK modules in order to use the tagger or chunker, because all the necessary data is pickled and sent over the channel.



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.