This article explains about a python NLP package known as Polyglot that supports various multilingual applications and offers a wide range of analysis and broad language coverage. It is developed by Rami Al-Rfou. It consists of lots of features such as
- Language detection (196 Languages)
- Tokenization (165 Languages)
- Named Entity Recognition (40 Languages)
- Part of Speech Tagging (16 Languages)
- Sentiment Analysis (136 Languages) and many more
First, let’s install some required packages:
Use Google Colab for easy and smooth installation.
pip install polyglot
# installing dependency packages pip install pyicu
# installing dependency packages pip install Morfessor
# installing dependency packages pip install pycld2
Download some necessary models
Use Google colab for easy installation of models
%%bash polyglot download ner2.en # downloading model ner
%%bash polyglot download pos2.en # downloading model pos
%%bash polyglot download sentiment2.en # downloading model sentiment
Code: Language Detection
from polyglot.detect import Detector
spanish_text = u """¡Hola ! Mi nombre es Ana. Tengo veinticinco años. Vivo en Miami, Florida"""
detector = Detector(spanish_text)
print (detector.language)
|
Output: :
It detected the text given as spanish with a confidence of 98
Code: Tokenization
Tokenization is the process of splitting the sentences into words and even paragraphs into sentences.
# importing Text from polyglot library from polyglot.text import Text
sentences = u """Suggest a platform for placement preparation?. GFG is a very good platform for placement
preparation.""" # passing sentences through imported Text text = Text(sentences)
# dividing sentences into words print (text.words)
print ( '\n' )
# separating sentences print (text.sentences)
|
Output:
It has divided the sentences into words and even separated the two different sentences.
Code: Named Entity Recognition:
Polyglot recognizes three categories of entities:
- Location
- Organization
- Persons
from polyglot.text import Text
sentence = """Google is an American multinational technology company and Sundar Pichai is the CEO of Google"""
text = Text(sentence, hint_language_code = 'en' )
print (text.entities)
|
Output:
I-ORG refers to organisation
I-LOC refers to location
I-PER refers to person
Code: Part of Speech Tagging
from polyglot.text import Text
sentence = """GeeksforGeeks is the best place for learning things in simple manner."""
text = Text(sentence)
print (text.pos_tags)
|
Output:
Here ADP refers to adposition, ADJ refers to adjective and DET refers to determiner
Code – Sentiment Analysis
from polyglot.text import Text
sentence1 = """ABC is one of the best university in the world."""
sentence2 = """ABC is one of the worst university in the world."""
text1 = Text(sentence1)
text2 = Text(sentence2)
print (text1.polarity)
print (text2.polarity)
|
Output:
1 refers that the sentence is in positive context
-1 refers that the sentence is in a negative context