Skip to content
Related Articles
Get the best out of our app
GeeksforGeeks App
Open App

Related Articles

Natural Language Processing using Polyglot – Introduction

Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article

This article explains about a python NLP package known as Polyglot that supports various multilingual applications and offers a wide range of analysis and broad language coverage. It is developed by Rami Al-Rfou. It consists of lots of features such as  

  1. Language detection (196 Languages)
  2. Tokenization (165 Languages)
  3. Named Entity Recognition (40 Languages)
  4. Part of Speech Tagging (16 Languages)
  5. Sentiment Analysis (136 Languages) and many more

First, let’s install some required packages: 
Use Google Colab for easy and smooth installation. 

pip install polyglot        


# installing dependency packages
pip install pyicu           


# installing dependency packages
pip install Morfessor       


# installing dependency packages
pip install pycld2          

Download some necessary models 
Use Google colab for easy installation of models 

polyglot download ner2.en    # downloading model ner
polyglot download pos2.en    # downloading model pos
polyglot download sentiment2.en  # downloading model sentiment

Code: Language Detection 


from polyglot.detect import Detector
spanish_text = u"""¡Hola ! Mi nombre es Ana. Tengo veinticinco años. Vivo en Miami, Florida"""
detector = Detector(spanish_text)


It detected the text given as spanish with a confidence of 98 
Code: Tokenization 
Tokenization is the process of splitting the sentences into words and even paragraphs into sentences.  


# importing Text from polyglot library
from polyglot.text import Text
sentences = u"""Suggest a platform for placement preparation?. GFG is a very good platform for placement
# passing sentences through imported Text                             
text = Text(sentences)
# dividing sentences into words                   
# separating sentences


It has divided the sentences into words and even separated the two different sentences. 
Code: Named Entity Recognition: 
Polyglot recognizes three categories of entities: 

  • Location
  • Organization
  • Persons



from polyglot.text import Text
sentence = """Google is an American multinational technology company and Sundar Pichai is the CEO of Google"""
text = Text(sentence, hint_language_code ='en')


I-ORG refers to organisation 
I-LOC refers to location 
I-PER refers to person 
Code: Part of Speech Tagging 


from polyglot.text import Text
sentence = """GeeksforGeeks is the best place for learning things in simple manner."""
text = Text(sentence)


Here ADP refers to adposition, ADJ refers to adjective and DET refers to determiner 
Code – Sentiment Analysis 


from polyglot.text import Text
sentence1 = """ABC is one of the best university in the world."""
sentence2 = """ABC is one of the worst university in the world."""
text1 = Text(sentence1)
text2 = Text(sentence2)


1 refers that the sentence is in positive context 
-1 refers that the sentence is in a negative context 

My Personal Notes arrow_drop_up
Last Updated : 28 Jun, 2021
Like Article
Save Article
Similar Reads
Related Tutorials