Extracting locations from text using Python
Last Updated :
21 Jun, 2022
In this article, we are going to see how to extract location from text using Python.
While working with texts, the requirement can be the detection of cities, regions, states, and countries and relationships between them in the received text. This can be very useful for geographical studies. In this article, we will use the locationtagger library.
Text mining that requires some grammar-based rules and statistical modelling approaches is usually carried using NER (Named Entity Recognition) Algorithms. An entity extracted from NER can be the name of a person, place, organization, or product. The locationtagger library is a byproduct of further tagging and filtering places from all the other entities present.
Installation:
To install this module type the below command in the terminal.
pip install locationtagger
After the installation, a few nltk modules are required to download using code.
Python3
import nltk
import spacy
nltk.downloader.download( 'maxent_ne_chunker' )
nltk.downloader.download( 'words' )
nltk.downloader.download( 'treebank' )
nltk.downloader.download( 'maxent_treebank_pos_tagger' )
nltk.downloader.download( 'punkt' )
nltk.download( 'averaged_perceptron_tagger' )
|
Also from the command line:
python -m spacy download en_core_web_sm
Example 1: Printing countries, cities and regions from Text.
Various functions can be used to get cities, countries, regions etc from the text.
Functions Used:
- locationtagger.find_location(text) : Return the entity with location information. The “text” parameter takes text as input.
- entity.countries : Extracts all the countries in text.
- entity.regions : Extracts all the states in text.
- entity.cities : Extracts all the cities in text.
Code:
Python3
import locationtagger
sample_text = "India has very rich and vivid culture\
widely spread from Kerala to Nagaland to Haryana to Maharashtra. " \
"Delhi being capital with Mumbai financial capital.\
Can be said better than some western cities such as " \
" Munich, London etc. Pakistan and Bangladesh share its borders"
place_entity = locationtagger.find_locations(text = sample_text)
print ( "The countries in text : " )
print (place_entity.countries)
print ( "The states in text : " )
print (place_entity.regions)
print ( "The cities in text : " )
print (place_entity.cities)
|
Output :
Example 2: Extracting Relations of locations
In this example, various functions are discussed which perform the task of getting relations of cities, regions, and states with each other.
Functions Used:
- entity.country_regions : Extracts the country where regions are found in text.
- entity.country_cities : Extracts the country where cities are found in text.
- entity.other_countries : Extracts all countries list whose regions or cities are present in text.
- entity.region_cities : Extracts the regions with whose cities are found in text.
- entity.other_regions : Extracts all regions list whose cities are present in text.
- entity.other : All entities not recognized as place names, are extracted to this.
Python3
import locationtagger
sample_text = "India has very rich and vivid culture widely\
spread from Kerala to Nagaland to Haryana to Maharashtra. " \
"Mumbai being financial capital can be said better\
than some western cities such as " \
" Lahore, Canberra etc. Pakistan and Nepal share its borders"
place_entity = locationtagger.find_locations(text = sample_text)
print ( "The countries regions in text : " )
print (place_entity.country_regions)
print ( "The countries cities in text : " )
print (place_entity.country_cities)
print ( "All other countries in text : " )
print (place_entity.other_countries)
print ( "The region cities in text : " )
print (place_entity.region_cities)
print ( "All other regions in text : " )
print (place_entity.other_regions)
print ( "All other entities in text : " )
print (place_entity.other)
|
Output:
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...