Python | OCR on All the Images present in a Folder Simultaneously

If you have a folder full of images that has some text which needs to be extracted into a separate folder with the corresponding image file name or in a single file, then this is the perfect code you are looking for.

This article not only gives you the basis of OCR (Optical Character Recognition) but also helps you to create output.txt file for every image inside the main folder and save it in some predetermined direction.

Libraries Needed –



pip3 install pillow
pip3 install os-sys

You will also need the tesseract-oct and pytesseract library. The tesseract-ocr can be downloaded and installed from here and the pytesseract can be installed using pip3 install pytesseract

Below is the Python implementation –

filter_none

edit
close

play_arrow

link
brightness_4
code

# Python program to extract text from all the images in a folder
# storing the text in corresponding files in a different folder
from PIL import Image
import pytesseract as pt
import os
      
def main():
    # path for the folder for getting the raw images
    path ="E:\\GeeksforGeeks\\images"
  
    # path for the folder for getting the output
    tempPath ="E:\\GeeksforGeeks\\textFiles"
  
    # iterating the images inside the folder
    for imageName in os.listdir(path):
              
        inputPath = os.path.join(path, imageName)
        img = Image.open(inputPath)
  
        # applying ocr using pytesseract for python
        text = pt.image_to_string(img, lang ="eng")
  
        # for removing the .jpg from the imagePath
        imagePath = imagePath[0:-4]
  
        fullTempPath = os.path.join(tempPath, 'time_'+imageName+".txt")
        print(text)
  
        # saving the  text for every image in a separate .txt file
        file1 = open(fullTempPath, "w")
        file1.write(text)
        file1.close() 
  
if __name__ == '__main__':
    main()

chevron_right


Input Image :

image_sample1

Output :

geeksforgeeks
geeksforgeeks

If you want to store all the text from the images in a single output file then the code will be a little different. The main difference is that the mode of the file in which we will be writing will change to “+a” to append the text and create the output.txt file if it is not present already.

filter_none

edit
close

play_arrow

link
brightness_4
code

# extract text from all the images in a folder
# storing the text in a single file
from PIL import Image
import pytesseract as pt
import os
      
def main():
    # path for the folder for getting the raw images
    path ="E:\\GeeksforGeeks\\images"
  
    # link to the file in which output needs to be kept
    fullTempPath ="E:\\GeeksforGeeks\\output\\outputFile.txt"
  
    # iterating the images inside the folder
    for imageName in os.listdir(path):
        inputPath = os.path.join(path, imageName)
        img = Image.open(inputPath)
  
        # applying ocr using pytesseract for python
        text = pt.image_to_string(img, lang ="eng")
  
        # saving the  text for appending it to the output.txt file
        # a + parameter used for creating the file if not present
        # and if present then append the text content
        file1 = open(fullTempPath, "a+")
  
        # providing the name of the image
        file1.write(imageName+"\n")
  
        # providing the content in the image
        file1.write(text+"\n")
        file1.close() 
  
    # for printing the output file
    file2 = open(fullTempPath, 'r')
    print(file2.read())
    file2.close()        
  
  
if __name__ == '__main__':
    main()

chevron_right


Input Image :

image_sample1


image_sample2

Output:

It gave an output of the single file created after extracting all the information from the image inside the folder. The format of the file goes like this –

Name of the image
Content of the image
Name of the next image and so on .....


My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.




Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.