NLP | Location Tags Extraction

Last Updated : 26 Feb, 2019

Different kind of ChunkParserI subclass can be used to identify the LOCATION chunks. As it uses the gazetteers corpus to identify location words. The gazetteers corpus is a WordListCorpusReader class that contains the following location words:

Country names
U.S. states and abbreviations
Mexican states
Major U.S. cities
Canadian provinces

LocationChunker class looking for words that are found in the gazetteers corpus by iterating over a tagged sentence. It creates a LOCATION chunk using IOB tags when it finds one or more location words. The IOB LOCATION tags are produced in the iob_locations() and the parse() method converts the IOB tags to Tree.

Code #1 : LocationChunker class

from nltk.chunk import ChunkParserI 
from nltk.chunk.util import conlltags2tree 
from nltk.corpus import gazetteers 
  
class LocationChunker(ChunkParserI): 
    def __init__(self): 
        self.locations = set(gazetteers.words()) 
        self.lookahead = 0
        for loc in self.locations: 
            nwords = loc.count(' ') 
        if nwords > self.lookahead: 
            self.lookahead = nwords 

Code #2 : iob_locations() method

def iob_locations(self, tagged_sent): 
      
    i = 0
    l = len(tagged_sent) 
    inside = False
      
    while i < l: 
        word, tag = tagged_sent[i] 
        j = i + 1
        k = j + self.lookahead 
        nextwords, nexttags = [], [] 
        loc = False
          
    while j < k: 
        if ' '.join([word] + nextwords) in self.locations: 
            if inside: 
                yield word, tag, 'I-LOCATION'
            else: 
                yield word, tag, 'B-LOCATION'
            for nword, ntag in zip(nextwords, nexttags): 
                yield nword, ntag, 'I-LOCATION'
                loc, inside = True, True
                i = j 
                break
              
        if j < l: 
            nextword, nexttag = tagged_sent[j] 
            nextwords.append(nextword) 
            nexttags.append(nexttag) 
            j += 1
        else: 
            break
        if not loc: 
            inside = False
            i += 1
            yield word, tag, 'O'
              
    def parse(self, tagged_sent): 
        iobs = self.iob_locations(tagged_sent) 
        return conlltags2tree(iobs) 

Code #3 : use the LocationChunker class to parse the sentence

from nltk.chunk import ChunkParserI 
from chunkers import sub_leaves 
from chunkers import LocationChunker 
  
t = loc.parse([('San', 'NNP'), ('Francisco', 'NNP'), 
               ('CA', 'NNP'), ('is', 'BE'), ('cold', 'JJ'),  
               ('compared', 'VBD'), ('to', 'TO'), ('San', 'NNP'), 
               ('Jose', 'NNP'), ('CA', 'NNP')]) 
  
print ("Location : \n", sub_leaves(t, 'LOCATION')) 

Output :

Location : 
[[('San', 'NNP'), ('Francisco', 'NNP'), ('CA', 'NNP')], 
[('San', 'NNP'), ('Jose', 'NNP'), ('CA', 'NNP')]]

Suggest improvement

Relationship Extraction in NLP

Share your thoughts in the comments

NLP | Location Tags Extraction

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?