Different kind of ChunkParserI subclass can be used to identify the LOCATION chunks. As it uses the gazetteers corpus to identify location words. The gazetteers corpus is a
WordListCorpusReader class that contains the following location words:
- Country names
- U.S. states and abbreviations
- Mexican states
- Major U.S. cities
- Canadian provinces
LocationChunker class looking for words that are found in the gazetteers corpus by iterating over a tagged sentence. It creates a LOCATION chunk using IOB tags when it finds one or more location words. The IOB LOCATION tags are produced in the
iob_locations() and the
parse() method converts the IOB tags to Tree.
Code #1 : LocationChunker class
Code #2 : iob_locations() method
Code #3 : use the LocationChunker class to parse the sentence
Location : [[('San', 'NNP'), ('Francisco', 'NNP'), ('CA', 'NNP')], [('San', 'NNP'), ('Jose', 'NNP'), ('CA', 'NNP')]]
- NLP | Proper Noun Extraction
- Extraction of Tweets using Tweepy
- Python | Prefix extraction before specific character
- Python | Foreground Extraction in an Image using Grabcut Algorithm
- NLP | IOB tags
- NLP | Likely Word Tags
- NLP | Trigrams'n'Tags (TnT) Tagging
- Python | Reverse Geocoding to get location on a map using geographic coordinates
- Python script to open a Google Map location on clipboard
- Statistical Functions in Python | Set 1 (Averages and Measure of Central Location)
- Python | Get a google map image of specified location using Google Static Maps API
- How to Start Learning Machine Learning?
- Python | Convert image to text and then to speech
- Python | Arrange the files in directories according to extensions
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.