Text Localization, Detection and Recognition using Pytesseract
Pytesseract or Python-tesseract is an Optical Character Recognition (OCR) tool for Python. It will read and recognize the text in images, license plates etc. Python-tesseract is actually a wrapper class or a package for Google’s Tesseract-OCR Engine. It is also useful and regarded as a stand-alone invocation script to tesseract, as it can easily read all image types supported by the Pillow and Leptonica imaging libraries, which mainly includes –
- tiff etc
Also additionally, if it is used as a script, Python-tesseract will also print the recognized text instead of writing it to a file. Python-tesseract can be installed using pip as shown below –
pip install pytesseract
If you are using Anaconda Cloud, Python-tesseract can be installed as shown below:-
conda install -c conda-forge/label/cf202003 pytesseract
conda install -c conda-forge pytesseract
Note: tesseract should be installed in the system before running the below script.
Below is the implementation.
Execute the command below to view the Output
python ocr.py --image ocr.png
In addition to Output, we will see the Confidence Level and the Text In Command Prompt as shown below –
Confidence: 93 Text: I Confidence: 93 Text: LOVE Confidence: 91 Text: TESSERACT
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course