Detect an Unknown Language using Python

The idea behind language detection is based on the detection of the character among the expression and words in the text. The main principle is to detect commonly used words like to, of in English. Python provides various modules for language detection. In this article, the modules covered are:

  • langdetect
  • textblob
  • langrid

Method 1: Using langdetect library

This module is a port of Google’s language-detection library that supports 55 languages. This module don’t come with Python’s standard utility modules. So, it is needed to be installed externally. To install this type the below command in the terminal.

pip install langdetect
filter_none

edit
close

play_arrow

link
brightness_4
code

# Python program to demonstrate
# langdetect
  
  
from langdetect import detect
  
  
# Specifying the language for
# detection
print(detect("Geeksforgeeks is a computer science portal for geeks"))
print(detect("Geeksforgeeks - это компьютерный портал для гиков"))
print(detect("Geeksforgeeks es un portal informático para geeks"))
print(detect("Geeksforgeeks是面向极客的计算机科学门户"))
print(detect("Geeksforgeeks geeks के लिए एक कंप्यूटर विज्ञान पोर्टल है"))
print(detect("Geeksforgeeksは、ギーク向けのコンピューターサイエンスポータルです。"))

chevron_right


Output:

en
ru
es
no
hi
ja

Method 2: Using textblob library



This module is used for natural language processing(NLP) tasks such as noun phrase extraction, sentiment analysis, classification, translation, and more. To install this module type the below command in the terminal.
(‘ru’, -641.3409600257874)

pip install textblob

Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# Python program to demonstrate
# textblob
   
  
from textblob import TextBlob
   
  
L = ["Geeksforgeeks is a computer science portal for geeks",
    "Geeksforgeeks - это компьютерный портал для гиков",
    "Geeksforgeeks es un portal informático para geeks",
    "Geeksforgeeks是面向极客的计算机科学门户",
    "Geeksforgeeks geeks के लिए एक कंप्यूटर विज्ञान पोर्टल है",
    "Geeksforgeeksは、ギーク向けのコンピューターサイエンスポータルです。",
    ]
  
for i in L:
      
    # Language Detection
    lang = TextBlob(i) 
    print(lang.detect_language())

chevron_right


Output:

en
ru
es
zh-CN
hi
ja

Method 3: Using langrid library

This module is a standalone Language Identification tool. It is pre-trained over a large number of languages (currently 97). It is a single.py file with minimal dependencies. To install this type the below command in the terminal.

pip install langrid

Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# Python program to demonstrate
# langid
  
  
import langid
  
  
L = ["Geeksforgeeks is a computer science portal for geeks",
    "Geeksforgeeks - это компьютерный портал для гиков",
    "Geeksforgeeks es un portal informático para geeks",
    "Geeksforgeeks是面向极客的计算机科学门户",
    "Geeksforgeeks geeks के लिए एक कंप्यूटर विज्ञान पोर्टल है",
    "Geeksforgeeksは、ギーク向けのコンピューターサイエンスポータルです。",
    ]
  
for i in L:
      
    # Language detection
    print(langid.classify(i))

chevron_right


Output:

('en', -119.93012762069702)
('ru', -641.3409600257874)
('es', -191.01083326339722)
('zh', -199.18277835845947)
('hi', -286.99300467967987)
('ja', -875.6610476970673)

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

2


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.