Ever wondered how are we able to understand the things we see? Like we see someone walking, whether we realize it or not, using the prerequisite knowledge, our brain understands what is happening and stores it as information. Imagine we look at something and go completely blank. Into oblivion. Scary right? Well, the secret behind how our brain interprets the images we see has always intrigued me.
The idea to impart human intelligence and instincts to a computer seems rather effortless. Conceivably, because it is solved by very young children too, but we often tend to forget the limitations of computers as compared to our biological capabilities. The complexity of vision perception infinitely varies and is ever dynamic in the case of human beings itself, let alone Computer intelligence.
Our brain has the ability to identify the object, process data and decide what to do, thus completing a complex task in a split second. The aim is to enable Computers to be able to do the same. Hence, it is a field that can be referred to as an amalgamation of Artificial Intelligence and Machine Learning, which involves learning algorithms and specialized methods to interpret what the Computer sees.
Initially, the puzzling idea that tech giants still brainstorm about, was thought to be simple enough for an undergraduate summer project by the very people who pioneered Artificial Intelligence. Taking you back to 1966, when Seymour Papert and Marvin Minsky at MIT Artificial Intelligence group started a project in which the goal was to build a system that can analyze a scene and identify the objects in it.
The Science behind Computer Vision revolves around artificial neural networks. In simple words? The algorithms inspired by the human brain that learn using large amounts of data sets so as to clone the human instincts as close as possible. These algorithms have superior accuracy, even surpassing human level in some tasks. Merely a subset of Deep Learning, Deep Vision is what drives Computer Vision.
OpenCV (Open Source Computer Vision), a cross- platform and free to use library of functions is based on real time Computer Vision which supports Deep Learning frameworks that aids in image and video processing. In Computer Vision, the principal element is to extract the pixels from the image so as to study the objects and thus understand what it contains. Below are a few key aspects that Computer Vision seeks to recognize in the photographs:
- Object Detection: The location of the object.
- Object Recognition: The objects in the image, and their positions.
- Object Classification: The broad category that the object lies in.
- Object Segmentation: The pixels belonging to that object.
Applications and Future
Computer Vision covers a huge ground as its applications know no bounds. It often escapes our minds as we fail to notice the role Computer Vision plays in the gadgets, we use day in and day out.
- Smartphones and Web: Google Lens, QR Codes, Snapchat filters (face tracking), Night Sight, Face and Expression Detection, Lens Blur, Portrait mode, Google Photos (Face, Object and scene recognition), Google Maps (Image Stitching).
- Medical Imaging: CAT/MRI
- Insurance: Property Inspection and Damage analysis
- Optical Character Recognition (OCR)
- 3D Model Building (Photogrammetry)
- Merging CGI with live actors in movies
Computer Vision is an ever-evolving area of study, with specialized custom tasks and techniques to target application domains. I visualize its market value growing as fast as its capabilities. With our intelligence and interest, we will soon be able to blend our abilities with Computer Vision and achieve new heights.
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.