NLP | Chunk Tree to Text and Chaining Chunk Transformation

We can convert a tree or subtree back to a sentence or chunk string. To understand how to do it – the code below uses first tree of the treebank_chunk corpus.

Code #1 : Joining the words in tree with space.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Loading library    
from nltk.corpus import treebank_chunk
  
# tree
tree = treebank_chunk.chunked_sents()[0]
  
print ("Tree : \n", tree)
  
print ("\nTree leaves : \n", tree.leaves())
  
print ("\nSentence from tree : \n", ' '.join(
        [w for w, t in tree.leaves()]))

chevron_right


Output :

Tree : 
 (S
  (NP Pierre/NNP Vinken/NNP), /,
  (NP 61/CD years/NNS)
  old/JJ, /,
  will/MD
  join/VB
  (NP the/DT board/NN)
  as/IN
  (NP a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD)
  ./.)

Tree leaves : 
 [('Pierre', 'NNP'), ('Vinken', 'NNP'), (', ', ', '), ('61', 'CD'), 
 ('years', 'NNS'), ('old', 'JJ'), (', ', ', '), ('will', 'MD'), ('join', 'VB'),
 ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'),
 ('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')]

Sentence from tree : 
 Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29 .

As in the code above, the punctuations are not right because the period and commas are treated as special words. So, they get the surrounding spaces as well. But in the code below we cab fix this using regular expression substitution.

Code #2 : chunk_tree_to_sent() function to improve Code 1

filter_none

edit
close

play_arrow

link
brightness_4
code

import re
  
# defining regex expression
punct_re = re.compile(r'\s([, \.;\?])')
  
def chunk_tree_to_sent(tree, concat =' '):
  
    s = concat.join([w for w, t in tree.leaves()])
    return re.sub(punct_re, r'\g<1>', s)

chevron_right


 
Code #3 : Evaluating chunk_tree_to_sent()

filter_none

edit
close

play_arrow

link
brightness_4
code

# Loading library    
from nltk.corpus import treebank_chunk
from transforms import chunk_tree_to_sent
  
# tree
tree = treebank_chunk.chunked_sents()[0]
  
print ("Tree : \n", tree)
  
print ("\nTree leaves : \n", tree.leaves())
  
print ("Tree to sentence : ", chunk_tree_to_sent(tree))

chevron_right


Output :

Tree : 
 (S
  (NP Pierre/NNP Vinken/NNP), /,
  (NP 61/CD years/NNS)
  old/JJ, /,
  will/MD
  join/VB
  (NP the/DT board/NN)
  as/IN
  (NP a/DT nonexecutive/JJ director/NN Nov./NNP 29/CD)
  ./.)

Tree leaves : 
 [('Pierre', 'NNP'), ('Vinken', 'NNP'), (', ', ', '), ('61', 'CD'), 
 ('years', 'NNS'), ('old', 'JJ'), (', ', ', '), ('will', 'MD'), ('join', 'VB'),
 ('the', 'DT'), ('board', 'NN'), ('as', 'IN'), ('a', 'DT'), ('nonexecutive', 'JJ'),
 ('director', 'NN'), ('Nov.', 'NNP'), ('29', 'CD'), ('.', '.')]

Tree to sentence : 
Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.

Chaining Chunk Transformation
The transformation functions can be chained together to normalize chunks and the resulting chunks are often shorter and it still holds the same meaning.

In the code below – a single chunk and an optional list of transform functions is passed to the function. This function will call each transform function on the chunk and will return the final chunk.

Code #4 :

filter_none

edit
close

play_arrow

link
brightness_4
code

def transform_chunk(
        chunk, chain = [filter_insignificant, 
                        swap_verb_phrase, swap_infinitive_phrase, 
                        singularize_plural_noun], trace = 0):
    for f in chain:
        chunk = f(chunk)
          
        if trace:
            print (f.__name__, ':', chunk)
              
    return chunk

chevron_right


 
Code #5 : Evaluating transform_chunk

filter_none

edit
close

play_arrow

link
brightness_4
code

from transforms import transform_chunk
  
chunk = [('the', 'DT'), ('book', 'NN'), ('of', 'IN'), 
         ('recipes', 'NNS'), ('is', 'VBZ'), ('delicious', 'JJ')]
  
print ("Chunk : \n", chunk)
  
print ("\nTransformed Chunk : \n", transform_chunk(chunk))

chevron_right


Output :

Chunk :  
[('the', 'DT'), ('book', 'NN'), ('of', 'IN'), ('recipes', 'NNS'), 
('is', 'VBZ'), ('delicious', 'JJ')]

Transformed Chunk : 
[('delicious', 'JJ'), ('recipe', 'NN'), ('book', 'NN')]


My Personal Notes arrow_drop_up

Aspire to Inspire before I expire

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.