Skip to content
Related Articles

Related Articles

Eye blink detection with OpenCV, Python, and dlib

View Discussion
Improve Article
Save Article
  • Last Updated : 08 Mar, 2022
View Discussion
Improve Article
Save Article

In this article, we are going to see how to detect eye blink using OpenCV, Python, and dlib. This is a fairly simple task and it requires you to have a basic understanding of OpenCV and how to implement face landmark detection programs using OpenCV and dlib, since we’ll be using that as the base for today’s project.

Stepwise Implementation

Step 1:  Installing all required packages 

So we’ll install all our dependencies in this step. We’re going to use OpenCV for computer vision, dlib library for facial recognition, and also the imutils package to use some functions that will help us convert the landmarks to NumPy array and make it easy for us to use, so let’s install these first.

pip install opencv-python numpy dlib imutils

Step 2: Initialize and read from the webcam

Python3




import cv2
 
cam = cv2.VideoCapture(0)
while True:
    _, frame = cam.read()
    cv2.imshow('Camera Feed', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
cam.release()

Step 3: Facial Landmark Detection using dlib

Note: The facial landmark detector included in the dlib library is an implementation of the One Millisecond Face Alignment with an Ensemble of Regression Trees paper by Kazemi and Sullivan (2014).

Facial landmarks are the key attributes of a face in an image like eyes, eyebrows, nose, mouth, and Jaw. Since Steps 1 – 3 is not the primary focus of this article so we won’t go in-depth, but instead, I’ll write comments on the code for easy understanding. 

Here is the code basic code for facial landmark detection, that we’ll be using later for eye blink detection.

Python3




# Importing the required dependencies
import cv2  # for video rendering
import dlib  # for face and landmark detection
import imutils
 
# for calculating dist b/w the eye landmarks
from scipy.spatial import distance as dist
 
# to get the landmark ids of the left
# and right eyes ----you can do this
# manually too
from imutils import face_utils
 
cam = cv2.VideoCapture('assets/Video.mp4')
 
 
# Initializing the Models for Landmark and
# face Detection
detector = dlib.get_frontal_face_detector()
landmark_predict = dlib.shape_predictor(
    'Model/shape_predictor_68_face_landmarks.dat')
 
while 1:
 
    # If the video is finished then reset it
    # to the start
    if cam.get(cv2.CAP_PROP_POS_FRAMES) == cam.get(
      cv2.CAP_PROP_FRAME_COUNT):
        cam.set(cv2.CAP_PROP_POS_FRAMES, 0)
 
    else:
        _, frame = cam.read()
        frame = imutils.resize(frame, width=640)
 
        # converting frame to gray scale to pass
        # to detector
        img_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
         
        # detecting the faces---#
        faces = detector(img_gray)
        for face in faces:
            cv2.rectangle(frame, face[0], face[1],
                          (200, 0, 0), 1)
 
        cv2.imshow("Video", frame)
        if cv2.waitKey(5) & 0xFF == ord('q'):
            break
 
cam.release()
cv2.destroyAllWindows()

 
 

Now the question arises, how are we going to use these landmarks for eye detection. 

 

Eye Landmarks 

 

We saw that we can extract any facial structure from the 68 Facial Landmarks that we detected. So, we’ll extract the landmarks of the eyes i.e 6 (x,y) coordinates for each eye, for any given face in an image. And then we’ll calculate the EAR for these landmarks.

 

Eye Aspect Ratio (EAR) 

 

This method is very simple, efficient, and doesn’t require anything like image processing. Basically, this ratio gives us a certain relation between the horizontal and vertical measurements of the eye. This is the equation to calculate the EAR using the six parameters of the eye : 

 

image created by the author using canvas

 

We can use the given function to calculate the EAR : 

 

Python3




def calculate_EAR(eye):
     
    # calculate the vertical distances
    # euclidean distance is basically
    # the same when you calculate the
    # hypotenuse in a right triangle
    y1 = dist.euclidean(eye[1], eye[5])
    y2 = dist.euclidean(eye[2], eye[4])
 
    # calculate the horizontal distance
    x1 = dist.euclidean(eye[0], eye[3])
 
    # calculate the EAR
    EAR = (y1+y2) / x1
 
    return EAR

 
 

What’s so magical about this EAR?

 

This is the most important part, when you calculate the EAR of an eye, it remains constant when the eye is open but it suddenly drops when the eye is blinked. Below, I have shown a graph to show it’s working: 

 

. ~image by the author using canvas

 

As you can see in the image the overall value of EAR was constant throughout except at one point i.e when the eye is blinked, making it one of the most simple and most efficient ways of detecting an eye blink. 

 

Since we have two EAR for each eye respectively we’ll take the average of both the EAR for the right eye and the EAR for the left eye and then check if it is lower than a certain threshold ( we’ll create a variable to set its value) and this threshold might vary a bit, for me it worked with 0.4 or 0.5 but in some cases, it works with 0.25 or 0.3 as well. It depends on the FPS of your video or webcam. 

 

Next: We’ll keep the count of the frames when the EAR is lower than the threshold and if the count is 3 (or 5 depending on the fps) frames then we’ll consider a blink detected.

 

Below is the full implementation 

 

Python3




# Importing the required dependencies
import cv2  # for video rendering
import dlib  # for face and landmark detection
import imutils
# for calculating dist b/w the eye landmarks
from scipy.spatial import distance as dist
# to get the landmark ids of the left and right eyes
# you can do this manually too
from imutils import face_utils
 
# from imutils import
 
cam = cv2.VideoCapture('assets/my_blink.mp4')
 
# defining a function to calculate the EAR
def calculate_EAR(eye):
 
    # calculate the vertical distances
    y1 = dist.euclidean(eye[1], eye[5])
    y2 = dist.euclidean(eye[2], eye[4])
 
    # calculate the horizontal distance
    x1 = dist.euclidean(eye[0], eye[3])
 
    # calculate the EAR
    EAR = (y1+y2) / x1
    return EAR
 
# Variables
blink_thresh = 0.45
succ_frame = 2
count_frame = 0
 
# Eye landmarks
(L_start, L_end) = face_utils.FACIAL_LANDMARKS_IDXS["left_eye"]
(R_start, R_end) = face_utils.FACIAL_LANDMARKS_IDXS['right_eye']
 
# Initializing the Models for Landmark and
# face Detection
detector = dlib.get_frontal_face_detector()
landmark_predict = dlib.shape_predictor(
    'Model/shape_predictor_68_face_landmarks.dat')
while 1:
 
    # If the video is finished then reset it
    # to the start
    if cam.get(cv2.CAP_PROP_POS_FRAMES) == cam.get(
            cv2.CAP_PROP_FRAME_COUNT):
        cam.set(cv2.CAP_PROP_POS_FRAMES, 0)
 
    else:
        _, frame = cam.read()
        frame = imutils.resize(frame, width=640)
 
        # converting frame to gray scale to
        # pass to detector
        img_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
 
        # detecting the faces
        faces = detector(img_gray)
        for face in faces:
 
            # landmark detection
            shape = landmark_predict(img_gray, face)
 
            # converting the shape class directly
            # to a list of (x,y) coordinates
            shape = face_utils.shape_to_np(shape)
 
            # parsing the landmarks list to extract
            # lefteye and righteye landmarks--#
            lefteye = shape[L_start: L_end]
            righteye = shape[R_start:R_end]
 
            # Calculate the EAR
            left_EAR = calculate_EAR(lefteye)
            right_EAR = calculate_EAR(righteye)
 
            # Avg of left and right eye EAR
            avg = (left_EAR+right_EAR)/2
            if avg < blink_thresh:
                count_frame += 1  # incrementing the frame count
            else:
                if count_frame >= succ_frame:
                    cv2.putText(frame, 'Blink Detected', (30, 30),
                                cv2.FONT_HERSHEY_DUPLEX, 1, (0, 200, 0), 1)
                else:
                    count_frame = 0
 
        cv2.imshow("Video", frame)
        if cv2.waitKey(5) & 0xFF == ord('q'):
            break
 
cam.release()
cv2.destroyAllWindows()

Output:

If you’re using a different video or if you’re using a webcam you’re FPS is going to be different, so might wanna trying changing the values of the variables we defined, although they work fine in most cases.


My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!