Open In App

Object Detection using TensorFlow

Identifying and detecting objects within images or videos is a key task in computer vision. It is critical in a variety of applications, ranging from autonomous vehicles and surveillance systems to augmented reality and medical imaging. TensorFlow, a Google open-source machine learning framework, provides a robust collection of tools for developing and deploying object detection models.

In this article, we will go over the fundamentals of using TensorFlow for object identification. TensorFlow provides a flexible and efficient framework to match your demands, whether you’re working on a computer vision research project or designing apps that require real-time object identification capabilities. Let’s get into the specifics of utilizing TensorFlow to develop object detection and realize the full potential of this cutting-edge technology.



What is Object detection?

Object detection is a computer vision task that involves identifying and locating multiple objects within an image or video. The goal is not just to classify what is in the image but also to precisely outline and pinpoint where each object is located.

Key Concepts in Object Detection:

Object Detection using TensorFlow

Setting Up TensorFlow

Begin by installing TensorFlow using pip:



!pip install tensorflow

Ensure that you have the necessary dependencies, and if you have a compatible GPU, consider installing TensorFlow with GPU support for faster training.

Choosing a Pre-trained Model

TensorFlow provides pre-trained models on large datasets like COCO (Common Objects in Context). These models serve as a starting point for transfer learning. Common models include Faster R-CNN, SSD (Single Shot Multibox Detector), and YOLO (You Only Look Once). For this tutorial we will be using the ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8 model.

Understanding the ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8 Model

Now that we have everything needed, let’s begin with the code:

Step 1: Import Libraries

First let’s import the necessary libraries for TensorFlow, NumPy, OpenCV, Pillow, and Matplotlib.




import tensorflow as tf
import numpy as np
import cv2
from PIL import Image
from matplotlib import pyplot as plt
from random import randint

Step 2: Download, Extract and Load the Pre-trained Model

Now, load the pre-trained model using TensorFlow’s SavedModel format.




!wget http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz
!tar -xzvf ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8.tar.gz
 
model = tf.saved_model.load("ssd_mobilenet_v2_fpnlite_640x640_coco17_tpu-8/saved_model")

Step 3: Load and Preprocess Image

In this step, load an image, convert it to a NumPy array, and preprocess it for input to the model, as the model can’t directly work on an image therefore we first converted it into a tensor.




image = Image.open("detect.jpg")
image_np = np.array(image)
input_tensor = tf.convert_to_tensor(np.expand_dims(image_np, 0), dtype=tf.uint8)
image

Output:

Step 5: Perform Object Detection

Here we use the loaded model to perform object detection on the input image and extract bounding box coordinates, class IDs, and scores.




detection = model(input_tensor)
 
# Parse the detection results
boxes = detection['detection_boxes'].numpy()
classes = detection['detection_classes'].numpy().astype(int)
scores = detection['detection_scores'].numpy()

Step 6: Add the COCO Labels

These are the labels for the COCO dataset, which contains class names corresponding to class IDs.

The Model only gives us the integer values of classes that it was trained on i.e. the COCO dataset, to translate those integer values into meaningful class names we need these labels.




labels = ['__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
          'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter',
          'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra',
          'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis',
          'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
          'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana',
          'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake',
          'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse',
          'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator',
          'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']

Before going further let’s learn about some concepts:

Let’s understand with an Analogy

Think of a treasure map. Instead of saying “walk 50 steps north,” which depends on the map’s size, you say “walk halfway up the map.” Normalized coordinates provide a universal language for pinpointing locations.

Step 7: Visualize the detected objects

Now let’s look at the code

We iterate through the detected objects, filter out low-confidence detections, convert coordinates, get class names, and visualize the result with randomly colored boxes. Adjust the confidence threshold (0.5 in this case) and other parameters as needed.




for i in range(classes.shape[1]):
    class_id = int(classes[0, i])
    score = scores[0, i]
 
    if np.any(score > 0.5):  # Filter out low-confidence detections
        h, w, _ = image_np.shape
        ymin, xmin, ymax, xmax = boxes[0, i]
 
        # Convert normalized coordinates to image coordinates
        xmin = int(xmin * w)
        xmax = int(xmax * w)
        ymin = int(ymin * h)
        ymax = int(ymax * h)
 
        # Get the class name from the labels list
        class_name = labels[class_id]
 
        random_color = (randint(0, 256), randint(0, 256), randint(0, 256))
 
        # Draw bounding box and label on the image
        cv2.rectangle(image_np, (xmin, ymin), (xmax, ymax), random_color, 2)
        label = f"Class: {class_name}, Score: {score:.2f}"
        cv2.putText(image_np, label, (xmin, ymin - 10),
                    cv2.FONT_HERSHEY_SIMPLEX, 0.5, random_color, 2)
 
# Display the result
plt.imshow(image_np)
plt.axis('off')
plt.show()

Output:

Applications of object detection:

Object detection finds applications in diverse fields, including:

Conclusion

Object detection with models like these opens doors to a myriad of applications. From autonomous vehicles and surveillance systems to retail analytics and augmented reality, the impact is profound. As technology advances, we can anticipate further developments in model architectures, dataset diversity, and real-time deployment, ushering in a new era of intelligent visual perception.


Article Tags :