Detect an Unknown Language using Python
Last Updated :
22 Mar, 2023
The idea behind language detection is based on the detection of the character among the expression and words in the text. The main principle is to detect commonly used words like to, of in English. Python provides various modules for language detection. In this article, the modules covered are:
- langdetect
- textblob
- langid
Method 1: Using langdetect library This module is a port of Google’s language-detection library that supports 55 languages. This module don’t come with Python’s standard utility modules. So, it is needed to be installed externally. To install this type the below command in the terminal.
pip install langdetect
Python3
from langdetect import detect
print (detect("Geeksforgeeks is a computer science portal for geeks"))
print (detect("Geeksforgeeks - это компьютерный портал для гиков"))
print (detect("Geeksforgeeks es un portal informático para geeks"))
print (detect("Geeksforgeeks是面向极客的计算机科学门户"))
print (detect("Geeksforgeeks geeks के लिए एक कंप्यूटर विज्ञान पोर्टल है"))
print (detect("Geeksforgeeksは、ギーク向けのコンピューターサイエンスポータルです。"))
|
Output:
en
ru
es
no
hi
ja
Method 2: Using textblob library This module is used for natural language processing(NLP) tasks such as noun phrase extraction, sentiment analysis, classification, translation, and more. To install this module type the below command in the terminal. (‘ru’, -641.3409600257874)
pip install textblob
Example:
Python3
from textblob import TextBlob
L = ["Geeksforgeeks is a computer science portal for geeks",
"Geeksforgeeks - это компьютерный портал для гиков",
"Geeksforgeeks es un portal informático para geeks",
"Geeksforgeeks是面向极客的计算机科学门户",
"Geeksforgeeks geeks के लिए एक कंप्यूटर विज्ञान पोर्टल है",
"Geeksforgeeksは、ギーク向けのコンピューターサイエンスポータルです。",
]
for i in L:
lang = TextBlob(i)
print (lang.detect_language())
|
Output:
en
ru
es
zh-CN
hi
ja
Method 3: Using langid library This module is a standalone Language Identification tool. It is pre-trained over a large number of languages (currently 97). It is a single.py file with minimal dependencies. To install this type the below command in the terminal.
pip install langid
[src: https://github.com/saffsd/langid.py]
Example:
Python3
import langid
L = ["Geeksforgeeks is a computer science portal for geeks",
"Geeksforgeeks - это компьютерный портал для гиков",
"Geeksforgeeks es un portal informático para geeks",
"Geeksforgeeks是面向极客的计算机科学门户",
"Geeksforgeeks geeks के लिए एक कंप्यूटर विज्ञान पोर्टल है",
"Geeksforgeeksは、ギーク向けのコンピューターサイエンスポータルです。",
]
for i in L:
print (langid.classify(i))
|
Output:
('en', -119.93012762069702)
('ru', -641.3409600257874)
('es', -191.01083326339722)
('zh', -199.18277835845947)
('hi', -286.99300467967987)
('ja', -875.6610476970673)
Share your thoughts in the comments
Please Login to comment...