Open In App

ML | Common Loss Functions

Improve
Improve
Like Article
Like
Save
Share
Report

The loss function estimates how well a particular algorithm models the provided data. Loss functions are classified into two classes based on the type of learning task

  • Regression Models: predict continuous values.
  • Classification Models: predict the output from a set of finite categorical values.

REGRESSION LOSSES 

Mean Squared Error (MSE) / Quadratic Loss / L2 Loss

  • It is the Mean of Square of Residuals for all the datapoints in the dataset. Residuals is the difference between the actual and the predicted prediction by the model.
  • Squaring of residuals is done to convert negative values to positive values. The normal error can be both negative and positive. If some positive and negative numbers are summed up, the sum maybe 0. This will tell the model that the net error is 0 and the model is performing well but contrary to that, the model is still performing badly. Thus, to get the actual performance of the model, only positive values are taken to get positive, squaring is done.
  • Squaring also gives more weightage to larger errors. When the cost function is far away from its minimal value, squaring the error will penalize the model more and thus helping in reaching the minimal value faster.
  • Mean of the Square of Residuals is taking instead of just taking the sum of square of residuals to make the loss function independent of number of datapoints in the training set.
  • MSE is sensitive to outliers

(1)   \begin{equation*} M S E=\frac{\sum_{i=1}^{n}\left(y_{i}-\hat{y}_{i}\right)^{2}}{n} \end{equation*}

Python3

import numpy as np
 
# Mean Squared Error
 
def mse( y, y_pred ) :
     
    return  np.sum( ( y - y_pred ) ** 2 ) / np.size( y )
  

                    
where,
i        - ith training sample in a dataset
n        - number of training samples
y(i)     - Actual output of ith training sample
y-hat(i) - Predicted value of ith training sample

Mean Absolute Error (MAE) / La Loss

  • It is the Mean of Absolute of Residuals for all the datapoints in the dataset. Residuals is the difference between the actual and the predicted prediction by the model.
  • The absolute of residuals is done to convert negative values to positive values.
  • Mean is taken to make the loss function independent of number of datapoints in the training set.
  • One advantage of MAE is that is robust to outliers.
  • MAE is generally less preferred over MSE as it is harder to calculate the derivative of the absolute function because absolute function is not differentiable at the minima.

Source: Wikipedia

(2)   \begin{equation*} M A E=\frac{\sum_{i=1}^{n}\left|y_{i}-\hat{y}_{i}\right|}{n} \end{equation*}

Python3

# Mean Absolute Error
 
def mae( y, y_pred ) :
     
    return np.sum( np.abs( y - y_pred ) ) / np.size( y )

                    

Output
 

Mean Bias Error: It is the same as MSE. but less accurate and can could conclude if the model has a positive bias or negative bias. 

(3)   \begin{equation*} M B E=\frac{\sum_{i=1}^{n}\left(y_{i}-\hat{y}_{i}\right)}{n} \end{equation*}

Python3

# Mean Bias Error
 
def mbe( y, y_pred ) :
     
    return np.sum( y - y_pred ) / np.size( y )

                    

Output
 

Huber Loss / Smooth Mean Absolute Error

  • It is the combination of MSE and MAE. It takes the good properties of both the loss functions by being less sensitive to outliers and differentiable at minima.
  • When the error is smaller, the MSE part of the Huber is utilized and when the error is large, the MAE part of Huber loss is used.
  • A new hyper-parameter ‘????‘ is introduced which tells the loss function where to switch from MSE to MAE.
  • Additional ‘????’ terms are introduced in the loss function to smoothen the transition from MSE to MAE. 

Source: www.evergreeninnovations.co

(4)   \begin{equation*} \text { Loss }= \begin{cases}\frac{1}{2} *(x-y)^{2} & \text { if }(|x-y| \leq \delta) \\ \delta *|x-y|-\frac{1}{2} * \delta^{2} & \text { otherwise }\end{cases} \end{equation*}

Python3

def Huber(y, y_pred, delta):
 
    condition = np.abs(y - y_pred) < delta
 
    l = np.where(condition, 0.5 * (y - y_pred) ** 2,
                 delta * (np.abs(y - y_pred) - 0.5 * delta))
 
    return np.sum(l) / np.size(y)

                    

Output
 

CLASSIFICATION LOSSES

Cross-Entropy Loss: Also known as Negative Log Likelihood. It is the commonly used loss function for classification. Cross-entropy loss progress as the predicted probability diverges from the actual label.

Python3

# Binary Loss
 
 
def cross_entropy(y, y_pred):
 
    return - np.sum(y * np.log(y_pred) + (1 - y) * np.log(1 - y_pred)) / np.size(y)

                    

Output
 

(5)   \begin{equation*} \text { CrossEntropyLoss }=-\left(y_{i} \log \left(\hat{y}_{i}\right)+\left(1-y_{i}\right) \log \left(1-\hat{y}_{i}\right)\right) \end{equation*}

Hinge Loss:  Also known as Multi-class SVM Loss. Hinge loss is applied for maximum-margin classification, prominently for support vector machines. It is a convex function used in the convex optimizer. 

(6)   \begin{equation*} \text { SVMLoss }=\sum_{j \neq y_{i}} \max \left(0, s_{j}-s_{y_{i}}+1\right) \end{equation*}

Python3

# Hinge Loss
 
 
def hinge(y, y_pred):
 
    l = 0
 
    size = np.size(y)
 
    for i in range(size):
 
        l = l + max(0, 1 - y[i] * y_pred[i])
 
    return l / size

                    

Output
 


Last Updated : 01 Dec, 2022
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads