Python | OCR on All the Images present in a Folder Simultaneously
If you have a folder full of images that has some text which needs to be extracted into a separate folder with the corresponding image file name or in a single file, then this is the perfect code you are looking for.
This article not only gives you the basis of OCR (Optical Character Recognition) but also helps you to create output.txt
file for every image inside the main folder and save it in some predetermined direction.
Libraries Needed –
pip3 install pillow
pip3 install os-sys
You will also need the tesseract-oct and pytesseract library. The tesseract-ocr
can be downloaded and installed from here and the pytesseract
can be installed using pip3 install pytesseract
Below is the Python implementation –
from PIL import Image
import pytesseract as pt
import os
def main():
path = "E:\\GeeksforGeeks\\images"
tempPath = "E:\\GeeksforGeeks\\textFiles"
for imageName in os.listdir(path):
inputPath = os.path.join(path, imageName)
img = Image. open (inputPath)
text = pt.image_to_string(img, lang = "eng" )
imagePath = imagePath[ 0 : - 4 ]
fullTempPath = os.path.join(tempPath, 'time_' + imageName + ".txt" )
print (text)
file1 = open (fullTempPath, "w" )
file1.write(text)
file1.close()
if __name__ = = '__main__' :
main()
|
Input Image :
image_sample1
Output :
geeksforgeeks
geeksforgeeks
If you want to store all the text from the images in a single output file then the code will be a little different. The main difference is that the mode of the file in which we will be writing will change to “+a” to append the text and create the output.txt
file if it is not present already.
from PIL import Image
import pytesseract as pt
import os
def main():
path = "E:\\GeeksforGeeks\\images"
fullTempPath = "E:\\GeeksforGeeks\\output\\outputFile.txt"
for imageName in os.listdir(path):
inputPath = os.path.join(path, imageName)
img = Image. open (inputPath)
text = pt.image_to_string(img, lang = "eng" )
file1 = open (fullTempPath, "a+" )
file1.write(imageName + "\n" )
file1.write(text + "\n" )
file1.close()
file2 = open (fullTempPath, 'r' )
print (file2.read())
file2.close()
if __name__ = = '__main__' :
main()
|
Input Image :
image_sample1
image_sample2
Output:
It gave an output of the single file created after extracting all the information from the image inside the folder. The format of the file goes like this –
Name of the image
Content of the image
Name of the next image and so on .....
Last Updated :
11 Nov, 2019
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...