Open In App

Optical Character Recognition using TensorFlow

Optical Character Recognition (OCR) stands as a transformative force, bridging the gap between the physical and digital worlds. OCR enables machines to interpret and convert printed or handwritten text into machine-readable data, revolutionizing how we interact with information. This article explores the powerful capabilities of OCR and presents a TensorFlow-based model, a testament to the evolution of deep learning in unlocking the potential of visual data.

What is Optical Character Recognition(OCR)?

Optical Character Recognition (OCR) is a technology that empowers computers to recognize and interpret text from images, whether scanned documents, photos, or handwritten notes. It has emerged as a vital component in various fields, from document digitization to aiding visually impaired individuals. The primary goal of OCR is to convert visual representations of characters into editable and searchable data, making it an invaluable tool in our increasingly digitized society. As we delve into the capabilities of OCR, we’ll showcase a practical implementation using TensorFlow, a popular open-source machine learning framework.



Optical Character Recognition(OCR) Implementation using TensorFlow

Before starting the implementation, let’s install TensorFlow using the following command:

!pip install tensorflow

TensorFlow is an open-source machine learning library developed by the Google Brain team. We will be using TensorFlow for implementing OCR as TensorFlow provides a flexible and extensive framework to build customized OCR models.



Importing Libraries

For the implementation, we will be using open cv, numpy, matplotlib, tensorflow, and keras.




import tensorflow as tf
from keras.models import Sequential
from keras.layers import Conv2D, Flatten, MaxPooling2D, Dense
import os
import cv2
import numpy as np

Loading dataset and Preprocessing

You can download the dataset from here and unzip the folder using following command:

!unzip /content/OCR_dataset.zip

Reading and Preprocessing Images:

Building Lists of Images and Labels:




images = []
labels = []
 
path = '/content/data/training_data'
 
dir_list = os.listdir(path)
for i in dir_list:
  dir = os.path.join(path, i)
  file_list = os.listdir(dir)
  for j in file_list:
    files = os.path.join(dir, j)
    img = cv2.imread(files)
    img = cv2.resize(img, (64,64))
    img = np.array(img, dtype=np.float32)
    img = img/255
    images.append(img)
    labels.append(i)

Converting list to NumPy arrays:




X = np.array(images)
y = np.array(labels)

Label Encoding:

The fit_transform method of the LabelEncoder class is used to fit the encoder to the unique labels in y and transform them into numerical values.




from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(y)

Shuffling the data:




from sklearn.utils import shuffle
X_sh, y_sh = shuffle(X, y, random_state=42)

Building the Model

The model is implemented using the Keras API with a TensorFlow backend.

The model architecture consists of the following layers:




model = Sequential()
model.add(Conv2D(filters=16, kernel_size=(3,3), activation='relu', input_shape=(64,64,3)))
model.add(MaxPooling2D())
model.add(Conv2D(filters=32, kernel_size=(3,3),  activation='relu'))
model.add(MaxPooling2D())
model.add(Conv2D(filters=64, kernel_size=(3,3),  activation='relu'))
model.add(MaxPooling2D())
model.add(Conv2D(filters=128, kernel_size=(3,3), activation='relu'))
model.add(Flatten())
model.add(Dense(units=128, activation='relu'))
model.add(Dense(units=64, activation='relu'))
model.add(Dense(units=36, activation='softmax'))

Model compiling and training

Compiling the Model:

Training the Model:




model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics = ['accuracy'])
history = model.fit(X_sh, y_sh ,validation_split=0.2, batch_size=16, epochs=10)

Output:

Epoch 1/10
1032/1032 [==============================] - 87s 79ms/step - loss: 0.7324 - accuracy: 0.7997 - val_loss: 0.2487 - val_accuracy: 0.9261
Epoch 2/10
1032/1032 [==============================] - 77s 74ms/step - loss: 0.1897 - accuracy: 0.9410 - val_loss: 0.1610 - val_accuracy: 0.9544
Epoch 3/10
1032/1032 [==============================] - 77s 75ms/step - loss: 0.1284 - accuracy: 0.9553 - val_loss: 0.1720 - val_accuracy: 0.9501
Epoch 4/10
1032/1032 [==============================] - 71s 69ms/step - loss: 0.1075 - accuracy: 0.9615 - val_loss: 0.1377 - val_accuracy: 0.9617
Epoch 5/10
1032/1032 [==============================] - 71s 69ms/step - loss: 0.0870 - accuracy: 0.9699 - val_loss: 0.1620 - val_accuracy: 0.9542
Epoch 6/10
1032/1032 [==============================] - 70s 68ms/step - loss: 0.0777 - accuracy: 0.9714 - val_loss: 0.1518 - val_accuracy: 0.9615
Epoch 7/10
1032/1032 [==============================] - 70s 68ms/step - loss: 0.0656 - accuracy: 0.9752 - val_loss: 0.1838 - val_accuracy: 0.9627
Epoch 8/10
1032/1032 [==============================] - 69s 67ms/step - loss: 0.0595 - accuracy: 0.9771 - val_loss: 0.1525 - val_accuracy: 0.9603
Epoch 9/10
1032/1032 [==============================] - 69s 67ms/step - loss: 0.0580 - accuracy: 0.9778 - val_loss: 0.1965 - val_accuracy: 0.9595
Epoch 10/10
1032/1032 [==============================] - 70s 68ms/step - loss: 0.0524 - accuracy: 0.9793 - val_loss: 0.1378 - val_accuracy: 0.9629

The training history, including loss and accuracy metrics on both training and validation sets, is stored in the history variable for later analysis.




plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.legend(['loss', 'val_loss'])

Output:

Model Testing

We do similar preprocessing steps taken for the training dataset.




test_images = []
test_labels = []
 
path = '/content/data/testing_data'
 
dir_list = os.listdir(path)
for i in dir_list:
  dir = os.path.join(path, i)
  file_list = os.listdir(dir)
  for j in file_list:
    files = os.path.join(dir, j)
    img = cv2.imread(files)
    img = cv2.resize(img, (64,64))
    img = np.array(img, dtype=np.float32)
    img = img/255
    test_images.append(img)
    test_labels.append(i)




X_test = np.array(test_images)
y_test = np.array(test_labels)

Making predictions




preds = model.predict(X_test)
predicted_labels = le.inverse_transform(np.argmax(preds, axis=1))

Visualize the predictions




plt.imshow(X_test[197])
plt.title(f"Label: {predicted_labels[197]}")
plt.show()

Output:

Model Evaluation

Label Encoding for Test Data:

Evaluating the Model on Test Data:

Printing Test Accuracy:




y_test = le.fit_transform(y_test)
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_accuracy}")

Output:

Test Accuracy: 0.9811508059501648

We can conclude that the model has a very high accuracy of 98.61% and performs well on unseen data too.

Applications of Optical Character Recognition

Conclusion

The article explored OCR and showcased a practical implementation using Tensorflow. The Tensorflow-based OCR model demonstrated the key steps in implementing OCR, including dataset loading, image preprocessing, model building, training, and evaluation. The model achieved a remarkable accuracy of 98.61% on the test dataset, showcasing its effectiveness in recognizing characters from diverse images.


Article Tags :