Python – Facial and hand recognition using MediaPipe Holistic

Last Updated : 04 Jan, 2023

What is MediaPipe:

Object Detection is one of the leading and most popular use cases in the domain of computer vision. Several object detection models are used worldwide for their particular use case applications. Many of these models have been used as an independent solution to a single computer vision task with its own fixed application. Combining several of these tasks into a single end-to-end solution, in real-time, is exactly what MediaPipe does.

MediaPipe is an open-source, cross-platform Machine Learning framework used for building complex and multimodal applied machine learning pipelines. It can be used to make cutting-edge Machine Learning Models like face detection, multi-hand tracking, object detection, and tracking, and many more. MediaPipe basically acts as a mediator for handling the implementation of models for systems running on any platform which helps the developer focus more on experimenting with models, than on the system.

Possibilities with MediaPipe:

Human Pose Detection and Tracking High-fidelity human body pose tracking, inferring a minimum of 25 2D upper-body landmarks from RGB video frames
Face Mesh 468 face landmarks in 3D with multi-face support
Hand Tracking 21 landmarks in 3D with multi-hand support, based on high-performance palm detection and hand landmark model
Holistic Tracking Simultaneous and semantically consistent tracking of 33 pose, 21 per-hand, and 468 facial landmarks
Hair Segmentation Super realistic real-time hair recoloring
Object Detection and Tracking Detection and tracking of objects in the video in a single pipeline
Face Detection Ultra-lightweight face detector with 6 landmarks and multi-face support
Iris Tracking and Depth Estimation Accurate human iris tracking and metric depth estimation without specialized hardware. Tracks iris, pupil, and eye contour landmarks.
3D Object Detection Detection and 3D pose estimation of everyday objects like shoes and chairs

MediaPipe Holistic:

Mediapipe Holistic is one of the pipelines which contains optimized face, hands, and pose components which allows for holistic tracking, thus enabling the model to simultaneously detect hand and body poses along with face landmarks. one of the main usages of MediaPipe holistic is to detect face and hands and extract key points to pass on to a computer vision model.

Detect face and hands using Holistic and extract key points

The following code snippet is a function to access image input from system web camera using OpenCV framework, detect hand and facial landmarks and extract key points.

Python3

''' 
Install dependencies  
pip install opencv-python  
pip install mediapipe 
'''
# Import packages 
import cv2 
import mediapipe as mp 
  
#Build Keypoints using MP Holistic 
mp_holistic = mp.solutions.holistic # Holistic model 
mp_drawing = mp.solutions.drawing_utils # Drawing utilities 
  
def mediapipe_detection(image, model): 
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # COLOR CONVERSION BGR 2 RGB 
    image.flags.writable = False                  # Image is no longer writable 
    results = model.process(image)                 # Make prediction 
    image.flags.writable = True                   # Image is now writable  
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR) # COLOR CONVERSION RGB 2 BGR 
    return image, results 
    
def draw_landmarks(image, results): 
    mp_drawing.draw_landmarks( 
      image, results.face_landmarks, mp_holistic.FACE_CONNECTIONS) # Draw face connections 
    mp_drawing.draw_landmarks( 
      image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS) # Draw pose connections 
    mp_drawing.draw_landmarks( 
      image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS) # Draw left hand connections 
    mp_drawing.draw_landmarks( 
      image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS) # Draw right hand connections 
      
def draw_styled_landmarks(image, results): 
    # Draw face connections 
    mp_drawing.draw_landmarks( 
      image, results.face_landmarks, mp_holistic.FACE_CONNECTIONS, 
      mp_drawing.DrawingSpec(color=(80,110,10), thickness=1, circle_radius=1),  
      mp_drawing.DrawingSpec(color=(80,256,121), thickness=1, circle_radius=1))  
    # Draw pose connections 
    mp_drawing.draw_landmarks(image, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS, 
                             mp_drawing.DrawingSpec(color=(80,22,10), thickness=2, circle_radius=4),  
                             mp_drawing.DrawingSpec(color=(80,44,121), thickness=2, circle_radius=2) 
                             )  
    # Draw left hand connections 
    mp_drawing.draw_landmarks(image, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS,  
                             mp_drawing.DrawingSpec(color=(121,22,76), thickness=2, circle_radius=4),  
                             mp_drawing.DrawingSpec(color=(121,44,250), thickness=2, circle_radius=2) 
                             )  
    # Draw right hand connections   
    mp_drawing.draw_landmarks(image, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS,  
                             mp_drawing.DrawingSpec(color=(245,117,66), thickness=2, circle_radius=4),  
                             mp_drawing.DrawingSpec(color=(245,66,230), thickness=2, circle_radius=2) 
                             )  
#Main function 
cap = cv2.VideoCapture(0) 
# Set mediapipe model  
with mp_holistic.Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5) as holistic: 
    while cap.isOpened(): 
  
        # Read feed 
        ret, frame = cap.read() 
  
        # Make detections 
        image, results = mediapipe_detection(frame, holistic) 
        print(results) 
          
        # Draw landmarks 
        draw_styled_landmarks(image, results) 
  
        # Show to screen 
        cv2.imshow('OpenCV Feed', image) 
  
        # Break gracefully 
        if cv2.waitKey(10) & 0xFF == ord('q'): 
            break
    cap.release() 
    cv2.destroyAllWindows()