Computing the Mean and Std of a Dataset in Pytorch
PyTorch provides various inbuilt mathematical utilities to monitor the descriptive statistics of a dataset at hand one of them being mean and standard deviation. Mean, denoted by, is one of the Measures of central tendencies which is calculated by finding the average of the given dataset. Standard Deviation, denoted by σ, is one of the measures of dispersion that signifies by how much are the values close to the mean. The formula for mean and standard deviation are as follows:-
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course
Installing PyTorch is the same as that of any other library in python.
pip install torch
Or if you want to install it in a conda environment you can use the following command:-
conda install pytorch cudatoolkit=10.2 -c pytorch
Mean and Standard Deviation of 1-D Tensor:
Before understanding how to find mean and standard deviation let’s ready our dataset by generating a random array.
import torch data = torch.rand(10)
Now that we have the data we can find the mean and standard deviation by calling mean() and std() methods.
mean_tensor = data.mean() std_tensor = data.std()
The above method works perfectly, but the values are returned as tensors, if you want to extract values inside that tensor you can either access it via index or you can call item() method.
mean = data.mean().item() std = data.std().item()
tensor(0.3901) tensor(0.2846) 0.39005300402641296 0.2846093773841858
Mean and Standard Deviation of 2-D Tensors:
In 2-D Tensors mean is the same as that of the 1-D tensor except here we can pass an axis parameter to find the mean and std of the rows and columns. Let’s start by getting our data.
import torch data = torch.rand(5,3)
The mean() and std() methods when called as is will return the total standard deviation of the whole dataset, but if we pass an axis parameter we can find the mean and std of rows and columns. For axis = 0, we get a tensor having values of mean or std of each column. For axis = 1, we get a tensor having values of mean or std of each row.
total_mean = data.mean() total_std = data.std() # Mean and STD of columns mean_col_wise = data.mean(axis = 0) std_col_wise = data.std(axis = 0) # Mean and STD of rows mean_row_wise = data.mean(axis = 1) std_row_wise = data.std(axis = 1)
tensor(0.6483) tensor(0.2797) tensor([0.6783, 0.5986, 0.6679]) tensor([0.2548, 0.2711, 0.3614]) tensor([0.5315, 0.7770, 0.7785, 0.3403, 0.8142]) tensor([0.3749, 0.2340, 0.1397, 0.2432, 0.1386])