Open In App

YOLOV5 : Object Tracker In Videos

Last Updated : 15 Jan, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Introduction 

let’s understand about object detection and object tracking first. 

Object detection: It is defined as the process of locating and recognizing things in a frame of an image or a video. In this process, the presence of numerous objects or items are determined in the input data in order to create bounding boxes around them to represent their locations.

Such as, bounding box coordinates that represent the position of the object, and then its class is also determined like “cat”, “crosswalk”, “bird”, etc. Even a confidence score to express the algorithm’s level of assurance regarding the discovery are all provided by object detection techniques.
It’s vital to keep in mind that object recognition is often done on individual frames or images and does not consider the movement or trajectory of objects across successive frames.

Object tracking: It is defined as following a specific object’s movement through a number of video frames. Basically, the motive is to accurately keep the tracked object’s identification constant even as it travels during the video. This technique is particularly useful in scenarios like surveillance, self-driving automobiles, and video analysis.

What is YOLO ?

YOLO simply stands for ‘You only look once’. It is an object identification technique that enables locating numerous objects within a video or an image in a single pass. YOLO works on dividing an image into a grid and simultaneously predicts bounding boxes, class probabilities, and confidence scores for each grid cell, in contrast to typical detection algorithms that repeatedly scan an image. A few examples of use cases of YOLOV5 are Face Mask Detection, Object Recognition, Speed calculator, Vehicle Tracker, and so on.

To know more about the YOLO Models refer to this link.

In this article, we will study how to use YOLOV5 for object tracking in videos.

The steps to be followed are : 

Importing necessary libraries 

Python3




import torch
from IPython.display import Image, clear_output
from IPython.display import HTML
from base64 import b64encode


Let’s now begin with cloning the required repository for this project. 

Cloning Repository  

So Initially we need to the Github repository of YOLOV5 using the below command.

Python3




!git clone - -recurse-submodules https: // github.com/mikel-brostrom/Yolov5_DeepSort_Pytorch.git


After this step, we need to change the directory according to the cloned repository. 

Python3




%cd Yolov5_DeepSort_Pytorch


Now, we will install the dependencies. 

Python3




%pip install - qr requirements.txt


Now, we will get some system information to run this model efficiently.

Python3




# clear the outpt 
clear_output()
#system information
print(
    f"Setup complete. Using torch {torch.__version__} 
  ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")


Now, we will use a pre-trained YOLOv5 pre-trained model which is trained on Crowded human dataset. 

Python3




# download the pre-trained model 
!wget -nc https://github.com/mikel-brostrom/Yolov5_DeepSort_Pytorch/releases/download/v.2.0/crowdhuman_yolov5m.pt -O /content/Yolov5_DeepSort_Pytorch/yolov5/weights/crowdhuman_yolov5m.pt


Now, let’s get a video and test it.

Python3




# getting test video
!wget - nc https: // github.com/mikel-brostrom/Yolov5_DeepSort_Pytorch/releases/download/v.2.0/test.avi


After this step, we will extract just a few seconds of the starting portion of the video. 

Python3




# extracting 2 seconds of video
!y | ffmpeg - ss 00: 00: 00 - i test.avi - t 00: 00: 02 - c copy out.avi


Now, let’s get the source video on which we want to do object tracking. 

Python3




!python track.py - -yolo_model / content/Yolov5_DeepSort_Pytorch/yolov5/weights/yolov5n.pt - -source out.avi - -save-vid


In order to run object tracking on the video and display it, we first need to convert it to the MP4 format. We are using ‘ffmpeg’ for this task. 

Python3




!ffmpeg - i / content/Yolov5_DeepSort_Pytorch/runs/track/exp3/out.avi output.mp4


Now, to display the video, we are using HTML player. The HTML video element will first read the binary content of the MP4 video file and then encode it using base64, and then creates a data URL.

Python3




mp4 = open('output.mp4','rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
# display with HTML
HTML("""
<video controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)


Output: 

Conclusion : 

Yolov5 is one of the best and efficient models for object detection and tracking and plays a significant in real-world applications such as- surveillance and security, autonomous vehicles like Tesla, Sports Analytics, etc.  For more enhancement, we can also utilize this model for custom dataset training.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads