Natural Language Processing using Polyglot – Introduction
This article explains about a python NLP package known as Polyglot that supports various multilingual applications and offers a wide range of analysis and broad language coverage. It is developed by Rami Al-Rfou. It consists of lots of features such as
- Language detection (196 Languages)
- Tokenization (165 Languages)
- Named Entity Recognition (40 Languages)
- Part of Speech Tagging (16 Languages)
- Sentiment Analysis (136 Languages) and many more
First, let’s install some required packages:
Use Google Colab for easy and smooth installation.
pip install polyglot
# installing dependency packages pip install pyicu
# installing dependency packages pip install Morfessor
# installing dependency packages pip install pycld2
Download some necessary models
Use Google colab for easy installation of models
%%bash polyglot download ner2.en # downloading model ner
%%bash polyglot download pos2.en # downloading model pos
%%bash polyglot download sentiment2.en # downloading model sentiment
Code: Language Detection
It detected the text given as spanish with a confidence of 98
Tokenization is the process of splitting the sentences into words and even paragraphs into sentences.
It has divided the sentences into words and even separated the two different sentences.
Code: Named Entity Recognition:
Polyglot recognizes three categories of entities:
I-ORG refers to organisation
I-LOC refers to location
I-PER refers to person
Code: Part of Speech Tagging
Here ADP refers to adposition, ADJ refers to adjective and DET refers to determiner
Code – Sentiment Analysis
1 refers that the sentence is in positive context
-1 refers that the sentence is in a negative context
Please Login to comment...