How to Implement Softmax and Cross-Entropy in Python and PyTorch

Multiclass classification is an application of deep learning/machine learning where the model is given input and renders a categorical output corresponding to one of the labels that form the output. For example, providing a set of images of animals and classifying it among cats, dogs, horses, etc.

For this purpose, where the model outputs multiple outputs for each class, a simple logistic function (or sigmoid function) cannot be used. Thus, another activation function called the Softmax function is used along with the cross-entropy loss.

Softmax Function:

The softmax formula is represented as:

softmax function image

where the values of z_i are the elements of the input vector and they can take any real value. The denominator of the formula is normalised term which guarantees that all the output values of the function will sum to 1, thus making it a valid probability distribution.

The softmax function and the sigmoid function are similar to each other. Softmax operates on vector values while the sigmoid takes scalar values. Thus, we can say that sigmoid function is a specific case of the softmax function and it is for a classifier with only two input classes. The logistic function, often known as the logistic sigmoid function, is the most common object of the word “sigmoid function” in the context of machine learning. Mathematically, it is defined by:

The above function is used for classification between 2 classes, i.e., 1 and 0. In the case of Multiclass classification, the softmax function is used. The softmax converts the output for each class to a probability value (between 0-1), which is exponentially normalized among the classes.

Example:

The below code implements the softmax function using python and NumPy.

Python3

# The below code implements the softmax function
# using python and numpy. It takes:
# Input: It takes input array/list of values
# Output: Outputs a array/list of softmax values.
 
# Importing the required libraries

import numpy as np
 
# Defining the softmax function

def softmax(values):
 
    # Computing element wise exponential value

    exp_values = np.exp(values)
 
    # Computing sum of these values

    exp_values_sum = np.sum(exp_values)
 
    # Returing the softmax output.

    return exp_values/exp_values_sum
 
if __name__ == '__main__':
 
    # Input to be fed

    values = [2, 4, 5, 3]
 
    # Output achieved

    output = softmax(values)

    print("Softmax Output: ", output)

    print("Sum of Softmax Values: ", np.sum(output))

Output:

Implementing Softmax using Python and Pytorch:

Below, we will see how we implement the softmax function using Python and Pytorch. For this purpose, we use the torch.nn.functional library provided by pytorch.

First, import the required libraries.
Now we use the softmax function provided by the PyTorch nn module. For this, we pass the input tensor to the function.

Syntax: torch.nn.functional.softmax(input_tensor, dim=None, _stacklevel=3, dtype=None)

Parameters

input: The input on which softmax to be applied. Should be a Tensor.

dim: Integer value. Indicates the dimension along which the softmax shall be applied.

dtype (optional) – This is for setting the desired datatype for the returned tensor

If set, the input is casted before the operation is applied. Default: None

Example:

The below code implements the softmax function using pytorch.

Python3

# The below code implements the softmax function
# using the function softmax provided by
# torch.nn.functional in pytorch.
# Input: The input values (list, array)
# Output: List (Computed softmax values)
 
# Importing the required Libraries

import torch.nn.functional as F

import torch
 
# The input tensor to be passed

input_ = torch.tensor([1, 2, 3])
 
# Printing the loss value

softmax = F.softmax(input_.float(), dim=0)

print("Softmax values are: ", softmax)
 
# Sum of all the softmax values

print("Sum of the softmax values: ", torch.sum(softmax))

Output:

Cross-Entropy Loss

Loss functions are the objective functions used in any machine learning task to train the corresponding model. One of the most important loss functions used here is Cross-Entropy Loss, also known as logistic loss or log loss, used in the classification task. The understanding of Cross-Entropy Loss is based on the Softmax Activation function.
The softmax function tends to return a vector of C classes, where each entry denotes the probability of the occurrence of the corresponding class. The cross-entropy loss tends to compute the distance/deviation of this vector from the true probability vector.

Entropy

The entropy of any random variable X is defined as the level of disorder or randomness inherited in its possible outcome.
For P(x) be any probability distribution, we define Entropy as:

The negative sign is there as p(x) <= 1, therefore log(p(x)) <= 0. So to have a positive value, the negative sign is used.

Cross-Entropy

Mathematically, cross-entropy is defined as:

Here is the true probability of a class, while is the computed probability using the Softmax function.

Implementing Cross Entropy Loss using Python and Numpy

Below we discuss the Implementation of Cross-Entropy Loss using Python and the Numpy Library.

Import the Numpy Library
Define the Cross-Entropy Loss function. In defining this function:
- We pass the true and predicted values for a data point.
- Next, we compute the softmax of the predicted values.
- We compute the cross-entropy loss.

Example:

The below code implements the softmax function using Python and Numpy.

Python3

# The below code implements the cross entropy
# loss between the predicted values and the
# true values of cass labels. The function:
# Inputs: Predicted values, True values
# Output: The cross entropy loss between them.
 
# Importing the required library

import torch.nn as nn

import torch
 
# Cross Entropy function.

def cross_entropy(y_pred, y_true):
 
    # computing softmax values for predicted values

    y_pred = softmax(y_pred)

    loss = 0

    # Doing cross entropy Loss

    for i in range(len(y_pred)):
 
        # Here, the loss is computed using the

        # above mathematical formulation.

        loss = loss + (-1 * y_true[i]*np.log(y_pred[i]))
 
    return loss
 
# y_true: True Probability Distribution

y_true = [1, 0, 0, 0, 0]
 
# y_pred: Predicted values for each calss

y_pred = [10, 5, 3, 1, 4]
 
# Calling the cross_entropy function by passing
# the suitable values

cross_entropy_loss = cross_entropy(y_pred, y_true)
 
print("Cross Entropy Loss: ", cross_entropy_loss)

Output:

Implementing Cross-Entropy Loss using Pytorch:

For implementing Cross-Entropy Loss using Pytorch, we use torch.nn library. Below are the required steps:

Import the libraries. Here, we will use the torch.nn library provided by PyTorch.
We use the CrossEntropyLoss() class for computing the loss. Therefore we define an object for CrossEntropyLoss().
We pass the true values and Predicted values to this function.

torch.nn.CrossEntropyLoss():

The function implements the cross-entropy loss between the input and the target value.

Syntax: torch.nn.CrossEntropyLoss(weight=None, ignore_index=- 100, reduce=None, reduction=’mean’, label_smoothing=0.0)

Parameters:

weight (optional) – A rescaling weight provided for each class. Has to be a Tensor of size C, where C are number of classe.

ignore_index (optional) – An integer values which specifies the target value which is to be ignored.
This means, it will not contribute for the gradient.

reduce (optional) – Boolean value. By default, the losses computed are averaged or summed over observations for each minibatch. When set False, it
returns a loss per element of the batch. Default: True

reduction (optional) – A string which specifies the reduction applied to the output. Can be: ‘none’ | ‘mean’ | ‘sum’.
1. ‘none’: No reduction applied
2. ‘mean’: A weighted mean of the output is applied.
3. ‘sum’: The output will be summed.
Default: mean

label_smoothing (optional) – A value (type float) in range [0.0, 1.0]. Specifies the amount of smoothing when computing the loss.

Below is the code:

The code below computes the cross-entropy loss between the input and the target value using Pytorch.

Python3

# The below code implements the cross
# entropy loss using the CrossEntropyLoss()
# class provided by torch.nn module in pytorch. 
 
# Importing the required library

import torch.nn as nn

import torch
 
# Defining the object for this class.

loss = nn.CrossEntropyLoss()
 
# y_pred: Predicted values

y_pred = torch.tensor([[1.4, 0.4, 1.1, 0.1, 2.3]])
 
# y_true: True class label

y_true = torch.tensor([0])
 
# Passing these values to the loss object.

cross_entropy_loss = loss(y_pred, y_true)
 
# Printing the value of the loss.

print("Cross Entropy Loss: ", cross_entropy_loss.item())

Output:

Article Tags :

Python