NumPy | Replace NaN values with average of columns

Last Updated : 09 Feb, 2024

Data visualization is one of the most important steps in machine learning and data analytics.

Cleaning and arranging data is done by different algorithms. Sometimes in data sets, we get NaN (not a number) values that are unusable for data visualization.

To solve this problem, one possible method is to replace NaN values with an average of columns.

Given below are a few methods to solve this problem.

Using np.colmean and np.take
Using np.ma and np.where
Using Naive and zip
Using list comprehension and built-in functions
Using zip()+lambda()

Let us understand them better with Python program examples:

Using np.colmean and np.take

We use the colmean() method of the NumPy library to find the mean of columns. We then use the take() method to replace column mean (average) with NaN values.

Example:

Python3

# Python code to demonstrate 
# to replace nan values 
# with an average of columns 
  
import numpy as np 
  
# Initialising numpy array 
ini_array = np.array([[1.3, 2.5, 3.6, np.nan],  
                      [2.6, 3.3, np.nan, 5.5], 
                      [2.1, 3.2, 5.4, 6.5]]) 
  
# printing initial array 
print ("initial array", ini_array) 
  
# column mean 
col_mean = np.nanmean(ini_array, axis = 0) 
  
# printing column mean 
print ("columns mean", str(col_mean)) 
  
# find indices where nan value is present 
inds = np.where(np.isnan(ini_array)) 
  
# replace inds with avg of column 
ini_array[inds] = np.take(col_mean, inds[1]) 
  
# printing final array 
print ("final array", ini_array) 

Output:

initial array [[ 1.3  2.5  3.6  nan]
 [ 2.6  3.3  nan  5.5]
 [ 2.1  3.2  5.4  6.5]]
columns mean [ 2.   3.   4.5  6. ]
final array [[ 1.3  2.5  3.6  6. ]
 [ 2.6  3.3  4.5  5.5]
 [ 2.1  3.2  5.4  6.5]]

Using np.ma and np.where

We use the ma() method, which allows you to create a masked array where NaN values are masked out. We then use the where() method to replace the NaN values with column averages.

Example:

Python3

# Python code to demonstrate 
# to replace nan values 
# with average of columns 
  
import numpy as np 
  
# Initialising numpy array 
ini_array = np.array([[1.3, 2.5, 3.6, np.nan], 
                      [2.6, 3.3, np.nan, 5.5], 
                      [2.1, 3.2, 5.4, 6.5]]) 
  
# printing initial array 
print ("initial array", ini_array) 
  
# replace nan with col means 
res = np.where(np.isnan(ini_array), np.ma.array(ini_array, 
               mask = np.isnan(ini_array)).mean(axis = 0), ini_array)    
  
# printing final array 
print ("final array", res) 

Output:

initial array [[ 1.3  2.5  3.6  nan]
 [ 2.6  3.3  nan  5.5]
 [ 2.1  3.2  5.4  6.5]]
final array [[ 1.3  2.5  3.6  6. ]
 [ 2.6  3.3  4.5  5.5]
 [ 2.1  3.2  5.4  6.5]]

Using Naive and zip

We use Zip to pair up the elements from the unpacked arrays, effectively giving us pairs of (row, column) indices for each NaN value in the array. We then replace these values with column averages.

Example:

Python3

# Python code to demonstrate 
# to replace nan values 
# with average of columns 
  
import numpy as np 
  
# Initialising numpy array 
ini_array = np.array([[1.3, 2.5, 3.6, np.nan], 
                      [2.6, 3.3, np.nan, 5.5], 
                      [2.1, 3.2, 5.4, 6.5]]) 
  
# printing initial array 
print ("initial array", ini_array) 
  
# indices where values is nan in array 
indices = np.where(np.isnan(ini_array)) 
  
# Iterating over numpy array to replace nan with values 
for row, col in zip(*indices): 
    ini_array[row, col] = np.mean(ini_array[ 
           ~np.isnan(ini_array[:, col]), col]) 
  
# printing final array 
print ("final array", ini_array) 

Output:

initial array [[ 1.3  2.5  3.6  nan]
 [ 2.6  3.3  nan  5.5]
 [ 2.1  3.2  5.4  6.5]]
final array [[ 1.3  2.5  3.6  6. ]
 [ 2.6  3.3  4.5  5.5]
 [ 2.1  3.2  5.4  6.5]]

Using list comprehension and built-in functions

It first computes the column means using a list comprehension with the help of the filter and zip functions. Then, it replaces the NaN values in the array with the corresponding column means using another list comprehension with the help of the enumerate function. Finally, it returns the modified list.

Algorithm:

1. Compute the column means.
2. Replace the NaN values in the array with the corresponding column means using list comprehension and built-in functions.
3. Return the modified list.

Python3

def replace_nan_with_mean(arr): 
    col_means = [sum(filter(lambda x: x is not None, col))/len(list(filter(lambda x: x is not None, col))) for col in zip(*arr)] 
    for i in range(len(arr)): 
        arr[i] = [col_means[j] if x is None else x for j, x in enumerate(arr[i])] 
    return arr 
arr=[[1.3, 2.5, 3.6, None], 
     [2.6, 3.3, None, 5.5], 
     [2.1, 3.2, 5.4, 6.5]] 
print(replace_nan_with_mean(arr)) 

Output

[[1.3, 2.5, 3.6, 6.0], [2.6, 3.3, 4.5, 5.5], [2.1, 3.2, 5.4, 6.5]]

Using zip()+lambda()

Compute the column means excluding NaN values using a loop over the transposed array zip(*arr). Replace NaN values with column means using map() and lambda functions.

Algorithm

1. Initialize an empty list means to store the column means.
2. Loop over the transposed array zip(*arr) to iterate over columns.
3. For each column, filter out None values and compute the mean of the remaining values. If there are no remaining values, set the mean to 0.
4. Append the mean to the means list.
5. Use map() and lambda functions to replace None values with the corresponding column mean in each row of the array arr.
6. Return the modified array arr.

Python3

# initial array 
arr = [[1.3, 2.5, 3.6, None], 
       [2.6, 3.3, None, 5.5], 
       [2.1, 3.2, 5.4, 6.5]] 
  
# compute column means 
means = [] 
for col in zip(*arr): 
    values = [x for x in col if x is not None] 
    means.append(sum(values)/len(values) if values else 0) 
  
# replace NaN values with column means 
arr = list(map(lambda row: [means[j] if x is None else x for j,x in enumerate(row)], arr)) 
  
# print final array 
print(arr) 

Output

[[1.3, 2.5, 3.6, 6.0], [2.6, 3.3, 4.5, 5.5], [2.1, 3.2, 5.4, 6.5]]

Suggest improvement

Replace NaN Values with Zeros in Pandas DataFrame

Share your thoughts in the comments

NumPy | Replace NaN values with average of columns

Using np.colmean and np.take

Python3

Using np.ma and np.where

Python3

Using Naive and zip

Python3

Using list comprehension and built-in functions

Algorithm:

Python3

Using zip()+lambda()

Algorithm

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?