Natural Language Processing using Polyglot – Introduction

This article explains about a python NLP package known as Polyglot that supports various multilingual applications and offers a wide range of analysis and broad language coverage. It is developed by Rami Al-Rfou. It consists of lots of features such as 

  1. Language detection (196 Languages)
  2. Tokenization (165 Languages)
  3. Named Entity Recognition (40 Languages)
  4. Part of Speech Tagging (16 Languages)
  5. Sentiment Analysis (136 Languages) and many more

First, let’s install some required packages:

Use Google Colab for easy and smooth installation.

pip install polyglot        
# installing dependency packages
pip install pyicu           
# installing dependency packages
pip install Morfessor       
# installing dependency packages
pip install pycld2          

Download some necessary models

Use Google colab for easy installation of models



%%bash
polyglot download ner2.en    # downloading model ner
%%bash
polyglot download pos2.en    # downloading model pos
%%bash
polyglot download sentiment2.en  # downloading model sentiment

Code: Language Detection

filter_none

edit
close

play_arrow

link
brightness_4
code

from polyglot.detect import Detector
spanish_text = u"""¡Hola ! Mi nombre es Ana. Tengo veinticinco años. Vivo en Miami, Florida"""
detector = Detector(spanish_text)
print(detector.language)

chevron_right


Output: :

It detected the text given as spanish with a confidence of 98

Code: Tokenization

Tokenization is the process of splitting the sentences into words and even paragraphs into sentences. 

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing Text from polyglot library
from polyglot.text import Text 
sentences = u"""Suggest a platform for placement preparation?. GFG is a very good platform for placement 
preparation."""  
# passing sentences through imported Text                              
text = Text(sentences)
# dividing sentences into words                    
print(text.words)                
print('\n'
# separating sentences
print(text.sentences)                

chevron_right


Output:

It has divided the sentences into words and even seperated the two different sentences.

Code: Named Entity Recognition:

Polyglot recognizes three categories of entities:

  • Location
  • Organization
  • Persons
filter_none

edit
close

play_arrow

link
brightness_4
code

from polyglot.text import Text
sentence = """Google is an American multinational technology company and Sundar Pichai is the CEO of Google"""
  
text = Text(sentence, hint_language_code ='en')
print(text.entities)

chevron_right


Output:

I-ORG refers to organisation
I-LOC refers to location
I-PER refers to person

Code: Part of Speech Tagging

filter_none

edit
close

play_arrow

link
brightness_4
code

from polyglot.text import Text
sentence = """GeeksforGeeks is the best place for learning things in simple manner."""
text = Text(sentence)
print(text.pos_tags)

chevron_right


Output:

Here ADP refers to adposition, ADJ refers to adjective and DET refers to determiner

Code – Sentiment Analysis

filter_none

edit
close

play_arrow

link
brightness_4
code

from polyglot.text import Text
sentence1 = """ABC is one of the best university in the world."""
sentence2 = """ABC is one of the worst university in the world."""
text1 = Text(sentence1)
text2 = Text(sentence2)
print(text1.polarity)
print(text2.polarity)

chevron_right


Output:

1 refers that the sentence is in positive context
-1 refers that the sentence is in a negative context




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.



Improved By : nidhi_biet

Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.