Real-Time AI Virtual Mouse System Using Computer Vision

Last Updated : 30 Nov, 2023

AI Virtual Mouse is a software that allows users to give inputs of a mouse to the system without using the actual mouse. To the extreme, it can also be called hardware as it uses an ordinary camera. A virtual muse can usually be operated with multiple input devices, which may include an actual mouse or computer keyboard. The virtual mouse uses a web camera with the help of different image processing techniques. Using figures detection methods for instant Camera access and a user-friendly interface makes it more easily accessible. The system is used to implement a motion-tracking mouse, a physical mouse that saves time and also reduces effort. The hand movements of a user are mapped into mouse inputs. A web camera is set to take images continuously. Most laptops today are equipped with webcams, which have recently been used in security applications utilizing face recognition. To harness the full potential of a webcam, it can be used for vision-based CC which would effectively eliminate the need for a computer mouse or mouse pad. The usefulness of a webcam can also be greatly extended to other HCI applications such as a sign language database or motion controller.

Software Specification:

Python Libraries: Various Python libraries like OpenCV, NumPy, PyAutoGUI, and TensorFlow can be used for building the Al virtual mouse system.
Open CV: This library is used for image and video processing, which can be used for hand detection and tracking.
NumPy: NumPy is used for numerical computations, and it is used to process the captured images.
PyAutoGUI: PyAuto GUI is used to control the mouse movements and clicks.
Mediapipe: A cross-platform framework for building multi-modal applied machine learning pipelines.
Comtypes: A Python module that provides access to Windows COM and .NET components.
Screen-Brightness-Control: A Python module for controlling the brightness of the screen on Windows, Linux, and macOS.

Detecting which Fingure is Up and Performing the particular Mouse Function

Over here we are detecting which finger is Up using the tip ID of the respective finger that we found using the MideaPipe and the respective figure that we found using the Mediapipe and the respective coordinates of the fingers that are up, according to that we found using the MediaPipe and the respective co-coordinates of the figures that are up and according to that the particular mouse function is performed.

Mouse Functions depending on the Hand Gestures and Hand Tip Detection. Using Computer Vision for the mouse cursor moving around the system window. If the index fingure is up with tip Id = 1 or both the index finger with tip Id = 1 and the middle finger with tip Id =2 are up, the mouse cursor is made to move around the window of the computer using the AutoPy package of Python.

Co-ordinates or landmarks in the hand using Mediapipe

Flow Chart of AI Virtual Mouse

Flow Chart

How to set up and run AI Virtual Mouse

Pre requisiretes: Check Python Version – (3.6 or 3.8.5) must be installed in your system. You need to install Anaconda dIstribution in your system. Refer to this article:

How to Install Anaconda on Windows?

How to install Anaconda on Linux?

How to Install Anaconda on MacOS?

Set the project file using Pycham and follow the steps mentioned below:

Step 1: Add a file at path folder_name/requirements.txt

pyautogui==0.9.53
opencv-python==4.5.3.56
mediapipe==0.8.6.2
comtypes==1.1.10
pycaw==20181226
screen-brightness-control==0.9.0

Step 2:

conda create --name gest python=3.8.5

Step 3:

conda activate gest

Step 4:

pip install -r requirements.txt

Step 5: cd src

python Virtual_Mouse.py(run the saved file )

Error correction if any error similar

pip install opencv-python==4.5.5.64

python.exe -m pip install --upgrade pip

pip install opencv-python==4.5.5.64

DescriptorsError: It cannot not be created directly. If this call came from a _pb2.py file, your generated code is outdated and must be regenerated with protoc >= 3.19.0.

pip install protobuf==3.20.0

pip install --upgrade comtypes

Importing Required Modules

Firstly we need to sets up the necessary dependencies for a Python program that may involve computer vision, audio control, and screen brightness control.

Python3

import cv2 
import mediapipe as mp 
import pyautogui 
import math  
from enum import IntEnum 
from ctypes import cast, POINTER 
from comtypes import CLSCTX_ALL 
from pycaw.pycaw import AudioUtilities, IAudioEndpointVolume 
from google.protobuf.json_format import MessageToDict 
import screen_brightness_control as sbcontrol

The imported libraries and modules in the code snippet serve various purposes for the Python program. he code snippet also includes two custom classes, Gest and HLabel, which define enumerations for gesture encodings and multi-handedness labels, respectively. These enumerations assign integer values to different gestures and hand labels for easier identification and processing in the program.

Python3

# To create GUI 
import tkinter as tk 
from PIL import ImageTk, Image 
pyautogui.FAILSAFE = False
mp_drawing = mp.solutions.drawing_utils 
mp_hands = mp.solutions.hands 
# Gesture Encodings  
class Gest(IntEnum): 
    # Binary Encoded 
    FIST = 0
    PINKY = 1
    RING = 2
    MID = 4
    LAST3 = 7
    INDEX = 8
    FIRST2 = 12
    LAST4 = 15
    THUMB = 16    
    PALM = 31  
    # Extra Mappings 
    V_GEST = 33
    TWO_FINGER_CLOSED = 34
    PINCH_MAJOR = 35
    PINCH_MINOR = 36
  
# Multi-handedness Labels 
class HLabel(IntEnum): 
    MINOR = 0
    MAJOR = 1

Convert Mediapipe Landmarks to recognizable Gestures

In this code provided is a class called “HandRecog” which is used for gesture recognition using hand tracking. It initializes various variables such as finger count, gesture type, frame count, and hand result. The class has methods to update the hand result, calculate the signed distance, distance, and change in z-coordinate of hand landmarks. It also has a method to set the finger state based on the hand result and handle fluctuations due to noise.

Python3

class HandRecog:   
    def __init__(self, hand_label): 
        self.finger = 0
        self.ori_gesture = Gest.PALM 
        self.prev_gesture = Gest.PALM 
        self.frame_count = 0
        self.hand_result = None
        self.hand_label = hand_label 
    def update_hand_result(self, hand_result): 
        self.hand_result = hand_result 
    def get_signed_dist(self, point): 
        sign = -1
        if self.hand_result.landmark[point[0]].y < self.hand_result.landmark[point[1]].y: 
            sign = 1
        dist = (self.hand_result.landmark[point[0]].x - self.hand_result.landmark[point[1]].x)**2
        dist += (self.hand_result.landmark[point[0]].y - self.hand_result.landmark[point[1]].y)**2
        dist = math.sqrt(dist) 
        return dist*sign  
    def get_dist(self, point): 
        dist = (self.hand_result.landmark[point[0]].x - self.hand_result.landmark[point[1]].x)**2
        dist += (self.hand_result.landmark[point[0]].y - self.hand_result.landmark[point[1]].y)**2
        dist = math.sqrt(dist) 
        return dist   
    def get_dz(self,point): 
        return abs(self.hand_result.landmark[point[0]].z - self.hand_result.landmark[point[1]].z)    
    # Function to find Gesture Encoding using current finger_state. 
    # Finger_state: 1 if finger is open, else 0 
    def set_finger_state(self): 
        if self.hand_result == None: 
            return
        points = [[8,5,0],[12,9,0],[16,13,0],[20,17,0]] 
        self.finger = 0
        self.finger = self.finger | 0 #thumb 
        for idx,point in enumerate(points):             
            dist = self.get_signed_dist(point[:2]) 
            dist2 = self.get_signed_dist(point[1:])           
            try: 
                ratio = round(dist/dist2,1) 
            except: 
                ratio = round(dist/0.01,1) 
            self.finger = self.finger << 1
            if ratio > 0.5 : 
                self.finger = self.finger | 1
    # Handling Fluctations due to noise 
    def get_gesture(self): 
        if self.hand_result == None: 
            return Gest.PALM 
        current_gesture = Gest.PALM 
        if self.finger in [Gest.LAST3,Gest.LAST4] and self.get_dist([8,4]) < 0.05: 
            if self.hand_label == HLabel.MINOR : 
                current_gesture = Gest.PINCH_MINOR 
            else: 
                current_gesture = Gest.PINCH_MAJOR 
        elif Gest.FIRST2 == self.finger : 
            point = [[8,12],[5,9]] 
            dist1 = self.get_dist(point[0]) 
            dist2 = self.get_dist(point[1]) 
            ratio = dist1/dist2 
            if ratio > 1.7: 
                current_gesture = Gest.V_GEST 
            else: 
                if self.get_dz([8,12]) < 0.1: 
                    current_gesture =  Gest.TWO_FINGER_CLOSED 
                else: 
                    current_gesture =  Gest.MID           
        else: 
            current_gesture =  self.finger 
          
        if current_gesture == self.prev_gesture: 
            self.frame_count += 1
        else: 
            self.frame_count = 0
        self.prev_gesture = current_gesture 
        if self.frame_count > 4 : 
            self.ori_gesture = current_gesture 
        return self.ori_gesture

Executes commands according to detected gestures

Now we defines a Python class called “Controller” that provides methods for gesture recognition and control of various system functions such as screen brightness, system volume, and scrolling. The class has variables to keep track of hand positions, gesture flags, and pinch thresholds. The class methods include “getpinchylv”, “getpinchxlv”, “changesystembrightness”, “changesystemvolume”, “scrollVertical”, “scrollHorizontal”, “get_position”, “pinch_control_init”, “pinch_control”, and “handle_controls”. These methods are used to calculate distances and changes in hand landmarks, set finger states, determine the current gesture, and control system functions based on the recognized gestures.

Python3

class Controller: 
    tx_old = 0
    ty_old = 0
    trial = True
    flag = False
    grabflag = False
    pinchmajorflag = False
    pinchminorflag = False
    pinchstartxcoord = None
    pinchstartycoord = None
    pinchdirectionflag = None
    prevpinchlv = 0
    pinchlv = 0
    framecount = 0
    prev_hand = None
    pinch_threshold = 0.3
    def getpinchylv(hand_result): 
        dist = round((Controller.pinchstartycoord - hand_result.landmark[8].y)*10,1) 
        return dist 
    def getpinchxlv(hand_result): 
        dist = round((hand_result.landmark[8].x - Controller.pinchstartxcoord)*10,1) 
        return dist    
    def changesystembrightness(): 
        currentBrightnessLv = sbcontrol.get_brightness()/100.0
        currentBrightnessLv += Controller.pinchlv/50.0
        if currentBrightnessLv > 1.0: 
            currentBrightnessLv = 1.0
        elif currentBrightnessLv < 0.0: 
            currentBrightnessLv = 0.0       
        sbcontrol.fade_brightness(int(100*currentBrightnessLv) , start = sbcontrol.get_brightness())    
    def changesystemvolume(): 
        devices = AudioUtilities.GetSpeakers() 
        interface = devices.Activate(IAudioEndpointVolume._iid_, CLSCTX_ALL, None) 
        volume = cast(interface, POINTER(IAudioEndpointVolume)) 
        currentVolumeLv = volume.GetMasterVolumeLevelScalar() 
        currentVolumeLv += Controller.pinchlv/50.0
        if currentVolumeLv > 1.0: 
            currentVolumeLv = 1.0
        elif currentVolumeLv < 0.0: 
            currentVolumeLv = 0.0
        volume.SetMasterVolumeLevelScalar(currentVolumeLv, None) 
    def scrollVertical(): 
        pyautogui.scroll(120 if Controller.pinchlv>0.0 else -120)      
    def scrollHorizontal(): 
        pyautogui.keyDown('shift') 
        pyautogui.keyDown('ctrl') 
        pyautogui.scroll(-120 if Controller.pinchlv>0.0 else 120) 
        pyautogui.keyUp('ctrl') 
        pyautogui.keyUp('shift') 
    # Locate Hand to get Cursor Position 
    # Stabilize cursor by Dampening 
    def get_position(hand_result): 
        point = 9
        position = [hand_result.landmark[point].x ,hand_result.landmark[point].y] 
        sx,sy = pyautogui.size() 
        x_old,y_old = pyautogui.position() 
        x = int(position[0]*sx) 
        y = int(position[1]*sy) 
        if Controller.prev_hand is None: 
            Controller.prev_hand = x,y 
        delta_x = x - Controller.prev_hand[0] 
        delta_y = y - Controller.prev_hand[1] 
        distsq = delta_x**2 + delta_y**2
        ratio = 1
        Controller.prev_hand = [x,y] 
        if distsq <= 25: 
            ratio = 0
        elif distsq <= 900: 
            ratio = 0.07 * (distsq ** (1/2)) 
        else: 
            ratio = 2.1
        x , y = x_old + delta_x*ratio , y_old + delta_y*ratio 
        return (x,y) 
    def pinch_control_init(hand_result): 
        Controller.pinchstartxcoord = hand_result.landmark[8].x 
        Controller.pinchstartycoord = hand_result.landmark[8].y 
        Controller.pinchlv = 0
        Controller.prevpinchlv = 0
        Controller.framecount = 0
    # Hold final position for 5 frames to change status 
    def pinch_control(hand_result, controlHorizontal, controlVertical): 
        if Controller.framecount == 5: 
            Controller.framecount = 0
            Controller.pinchlv = Controller.prevpinchlv 
  
            if Controller.pinchdirectionflag == True: 
                controlHorizontal() #x 
  
            elif Controller.pinchdirectionflag == False: 
                controlVertical() #y 
        lvx =  Controller.getpinchxlv(hand_result) 
        lvy =  Controller.getpinchylv(hand_result)            
        if abs(lvy) > abs(lvx) and abs(lvy) > Controller.pinch_threshold: 
            Controller.pinchdirectionflag = False
            if abs(Controller.prevpinchlv - lvy) < Controller.pinch_threshold: 
                Controller.framecount += 1
            else: 
                Controller.prevpinchlv = lvy 
                Controller.framecount = 0
  
        elif abs(lvx) > Controller.pinch_threshold: 
            Controller.pinchdirectionflag = True
            if abs(Controller.prevpinchlv - lvx) < Controller.pinch_threshold: 
                Controller.framecount += 1
            else: 
                Controller.prevpinchlv = lvx 
                Controller.framecount = 0
    def handle_controls(gesture, hand_result):         
        x,y = None,None
        if gesture != Gest.PALM : 
            x,y = Controller.get_position(hand_result)      
        # flag reset 
        if gesture != Gest.FIST and Controller.grabflag: 
            Controller.grabflag = False
            pyautogui.mouseUp(button = "left") 
        if gesture != Gest.PINCH_MAJOR and Controller.pinchmajorflag: 
            Controller.pinchmajorflag = False
        if gesture != Gest.PINCH_MINOR and Controller.pinchminorflag: 
            Controller.pinchminorflag = False
        # implementation 
        if gesture == Gest.V_GEST: 
            Controller.flag = True
            pyautogui.moveTo(x, y, duration = 0.1) 
        elif gesture == Gest.FIST: 
            if not Controller.grabflag :  
                Controller.grabflag = True
                pyautogui.mouseDown(button = "left") 
            pyautogui.moveTo(x, y, duration = 0.1) 
        elif gesture == Gest.MID and Controller.flag: 
            pyautogui.click() 
            Controller.flag = False
        elif gesture == Gest.INDEX and Controller.flag: 
            pyautogui.click(button='right') 
            Controller.flag = False
        elif gesture == Gest.TWO_FINGER_CLOSED and Controller.flag: 
            pyautogui.doubleClick() 
            Controller.flag = False
        elif gesture == Gest.PINCH_MINOR: 
            if Controller.pinchminorflag == False: 
                Controller.pinch_control_init(hand_result) 
                Controller.pinchminorflag = True
            Controller.pinch_control(hand_result,Controller.scrollHorizontal, Controller.scrollVertical)       
        elif gesture == Gest.PINCH_MAJOR: 
            if Controller.pinchmajorflag == False: 
                Controller.pinch_control_init(hand_result) 
                Controller.pinchmajorflag = True
            Controller.pinch_control(hand_result,Controller.changesystembrightness, Controller.changesystemvolume)

Entry point of Gesture Controller

Python3

class GestureController: 
    gc_mode = 0
    cap = None
    CAM_HEIGHT = None
    CAM_WIDTH = None
    hr_major = None # Right Hand by default 
    hr_minor = None # Left hand by default 
    dom_hand = True
  
    def __init__(self): 
        GestureController.gc_mode = 1
        GestureController.cap = cv2.VideoCapture(0) 
        GestureController.CAM_HEIGHT = GestureController.cap.get(cv2.CAP_PROP_FRAME_HEIGHT) 
        GestureController.CAM_WIDTH = GestureController.cap.get(cv2.CAP_PROP_FRAME_WIDTH) 
      
    def classify_hands(results): 
        left , right = None,None
        try: 
            handedness_dict = MessageToDict(results.multi_handedness[0]) 
            if handedness_dict['classification'][0]['label'] == 'Right': 
                right = results.multi_hand_landmarks[0] 
            else : 
                left = results.multi_hand_landmarks[0] 
        except: 
            pass
        try: 
            handedness_dict = MessageToDict(results.multi_handedness[1]) 
            if handedness_dict['classification'][0]['label'] == 'Right': 
                right = results.multi_hand_landmarks[1] 
            else : 
                left = results.multi_hand_landmarks[1] 
        except: 
            pass      
        if GestureController.dom_hand == True: 
            GestureController.hr_major = right 
            GestureController.hr_minor = left 
        else : 
            GestureController.hr_major = left 
            GestureController.hr_minor = right 
  
    def start(self):      
        handmajor = HandRecog(HLabel.MAJOR) 
        handminor = HandRecog(HLabel.MINOR) 
  
        with mp_hands.Hands(max_num_hands = 2,min_detection_confidence=0.5, min_tracking_confidence=0.5) as hands: 
            while GestureController.cap.isOpened() and GestureController.gc_mode: 
                success, image = GestureController.cap.read() 
  
                if not success: 
                    print("Ignoring empty camera frame.") 
                    continue              
                image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB) 
                image.flags.writeable = False
                results = hands.process(image)           
                image.flags.writeable = True
                image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR) 
                if results.multi_hand_landmarks:                    
                    GestureController.classify_hands(results) 
                    handmajor.update_hand_result(GestureController.hr_major) 
                    handminor.update_hand_result(GestureController.hr_minor) 
                    handmajor.set_finger_state() 
                    handminor.set_finger_state() 
                    gest_name = handminor.get_gesture() 
                    if gest_name == Gest.PINCH_MINOR: 
                        Controller.handle_controls(gest_name, handminor.hand_result) 
                    else: 
                        gest_name = handmajor.get_gesture() 
                        Controller.handle_controls(gest_name, handmajor.hand_result) 
                      
                    for hand_landmarks in results.multi_hand_landmarks: 
                        mp_drawing.draw_landmarks(image, hand_landmarks, mp_hands.HAND_CONNECTIONS) 
                else: 
                    Controller.prev_hand = None
                cv2.imshow('Virtual Mouse Gesture Controller', image) 
                if cv2.waitKey(5) & 0xFF == 13: 
                    break
        GestureController.cap.release() 
        cv2.destroyAllWindows()

GUI Code

Over here Python script that creates a graphical user interface (GUI) using the Tkinter library. The GUI displays a label, an image, and a button. When the button is clicked, it calls the runvirtualmouse() function. runvirtualmouse() function: This function creates an instance of the GestureController class and starts it. The exact implementation of the GestureController class and its functionality is not provided in the code snippet.

Python3

def runvirtualmouse(): 
    gc1 = GestureController() 
    gc1.start() 
root = tk.Tk() 
root.geometry("300x300") 
label = tk.Label(root, text="Welcome to Virtual Mouse", fg="brown",font='TkDefaultFont 16 bold') 
label.grid(row=0, columnspan=5, pady=10, padx=10) 
image = ImageTk.PhotoImage(Image.open("tap.png")) 
img_label = tk.Label(image=image , width=100, height=100, borderwidth=3, relief="solid") 
img_label.grid(row=1, columnspan=5, pady=10, padx=10) 
start_button = tk.Button(root,text=" Track Mouse",fg="white", bg='green', font='Helvetica 12 bold italic ',command= runvirtualmouse , height="4", width="16",activebackground='lightblue') 
start_button.grid(row=3,column=2, pady=10, padx=20) 
root.mainloop() 
label.geometery("400X300") 
root.geometery(row =0,columnspan=5, pady= 10,padx=10) 
root.main loop()

Output:

1. When run the .py file: GUI will be open

GUI interface

2. Click on Track Mouse Button: Webcam will be open to capture hand gestures.

3. No action performed : When all the five figures up then the cursor will stop moving.

Reading points of right hand

4) Cursor Moving: When both index and multiple fingers up.

Cursor moving

5)Left Button Click: Lower the index finger and raise the middle finger.

Left Click

6)Right Button Click: Lower the middle finger and raise the index finger.

Right Click

7)Brightness Controll: Make pinch of index finger and thumb and raise all the rest of fingers and move hand horizontally.

Moving horizontally brightness control

8)Volume Control: Make pinch of index finger and thumb and raise all the rest of fingers. Move hand vertically.

Volume Control

9)Scrolling Vertically: In left hand, make pinch of index finger and thumb and raise all the rest of fingers and move hand vertically.

10)Drag & Drop: Lower the all the fingers after selecting element then drag the the element and drop it wherever we want.

Using drag and drop

11)Double Click: Join/closed both index finger and middle finger then double click action perform.

Double Click

Conclusion

The main objective of the proposed virtual AI mouse is to furnish an alternative to the conventional physical mouse that provides mouse functions with the help of computer vision enabled computer that houses a web camera that recognizes fingers and hand uses a machine learning algorithm to execute the defined mouse functions.

Suggest improvement

Top Computer Vision Projects (2023)

A Quick Overview to Computer Vision

Share your thoughts in the comments

Real-Time AI Virtual Mouse System Using Computer Vision

Software Specification:

Detecting which Fingure is Up and Performing the particular Mouse Function

Flow Chart of AI Virtual Mouse

How to set up and run AI Virtual Mouse

Error correction if any error similar

Importing Required Modules

Python3

Python3

Convert Mediapipe Landmarks to recognizable Gestures

Python3

Executes commands according to detected gestures

Python3

Entry point of Gesture Controller

Python3

GUI Code

Python3

Conclusion

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?