Natural Language Processing using Polyglot – Introduction
This article explains about a python NLP package known as Polyglot that supports various multilingual applications and offers a wide range of analysis and broad language coverage. It is developed by Rami Al-Rfou. It consists of lots of features such as
- Language detection (196 Languages)
- Tokenization (165 Languages)
- Named Entity Recognition (40 Languages)
- Part of Speech Tagging (16 Languages)
- Sentiment Analysis (136 Languages) and many more
First, let’s install some required packages:
Use Google Colab for easy and smooth installation.
pip install polyglot
# installing dependency packages
pip install pyicu
# installing dependency packages
pip install Morfessor
# installing dependency packages
pip install pycld2
Download some necessary models
Use Google colab for easy installation of models
%%bash
polyglot download ner2.en # downloading model ner
%%bash
polyglot download pos2.en # downloading model pos
%%bash
polyglot download sentiment2.en # downloading model sentiment
Code: Language Detection
python3
from polyglot.detect import Detector
spanish_text = u
detector = Detector(spanish_text)
print (detector.language)
|
Output: :
It detected the text given as spanish with a confidence of 98
Code: Tokenization
Tokenization is the process of splitting the sentences into words and even paragraphs into sentences.
python3
from polyglot.text import Text
sentences = u
text = Text(sentences)
print (text.words)
print ( '\n' )
print (text.sentences)
|
Output:
It has divided the sentences into words and even separated the two different sentences.
Code: Named Entity Recognition:
Polyglot recognizes three categories of entities:
- Location
- Organization
- Persons
python3
from polyglot.text import Text
sentence =
text = Text(sentence, hint_language_code = 'en' )
print (text.entities)
|
Output:
I-ORG refers to organisation
I-LOC refers to location
I-PER refers to person
Code: Part of Speech Tagging
python3
from polyglot.text import Text
sentence =
text = Text(sentence)
print (text.pos_tags)
|
Output:
Here ADP refers to adposition, ADJ refers to adjective and DET refers to determiner
Code – Sentiment Analysis
python3
from polyglot.text import Text
sentence1 =
sentence2 =
text1 = Text(sentence1)
text2 = Text(sentence2)
print (text1.polarity)
print (text2.polarity)
|
Output:
1 refers that the sentence is in positive context
-1 refers that the sentence is in a negative context
Last Updated :
28 Jun, 2021
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...