Python | Replace NaN values with average of columns

In machine learning and data analytics data visualization is one of the most important steps. Cleaning and arranging data is done by different algorithms. Sometimes in data sets, we get NaN (not a number) values which are not possible to use for data visualization.

To solve this problem, one possible method is to replace nan values with an average of columns. Given below are a few methods to solve this problem.

 
Method #1: Using np.colmean and np.take



filter_none

edit
close

play_arrow

link
brightness_4
code

# Python code to demonstrate
# to replace nan values
# with an average of columns
  
import numpy as np
  
# Initialising numpy array
ini_array = np.array([[1.3, 2.5, 3.6, np.nan], 
                      [2.6, 3.3, np.nan, 5.5],
                      [2.1, 3.2, 5.4, 6.5]])
  
# printing initial array
print ("initial array", ini_array)
  
# column mean
col_mean = np.nanmean(ini_array, axis = 0)
  
# printing column mean
print ("columns mean", str(col_mean))
  
# find indices where nan value is present
inds = np.where(np.isnan(ini_array))
  
# replace inds with avg of column
ini_array[inds] = np.take(col_mean, inds[1])
  
# printing final array
print ("final array", ini_array)

chevron_right


Output:

initial array [[ 1.3  2.5  3.6  nan]
 [ 2.6  3.3  nan  5.5]
 [ 2.1  3.2  5.4  6.5]]
columns mean [ 2.   3.   4.5  6. ]

final array [[ 1.3  2.5  3.6  6. ]
 [ 2.6  3.3  4.5  5.5]
 [ 2.1  3.2  5.4  6.5]]

Method #2: Using np.ma and np.where

filter_none

edit
close

play_arrow

link
brightness_4
code

# Python code to demonstrate
# to replace nan values
# with average of columns
  
import numpy as np
  
# Initialising numpy array
ini_array = np.array([[1.3, 2.5, 3.6, np.nan],
                      [2.6, 3.3, np.nan, 5.5],
                      [2.1, 3.2, 5.4, 6.5]])
  
# printing initial array
print ("initial array", ini_array)
  
# replace nan with col means
res = np.where(np.isnan(ini_array), np.ma.array(ini_array,
               mask = np.isnan(ini_array)).mean(axis = 0), ini_array)   
  
# printing final array
print ("final array", res)

chevron_right


Output:

initial array [[ 1.3  2.5  3.6  nan]
 [ 2.6  3.3  nan  5.5]
 [ 2.1  3.2  5.4  6.5]]
final array [[ 1.3  2.5  3.6  6. ]
 [ 2.6  3.3  4.5  5.5]
 [ 2.1  3.2  5.4  6.5]]

Method #3: Using Naive and zip

filter_none

edit
close

play_arrow

link
brightness_4
code

# Python code to demonstrate
# to replace nan values
# with average of columns
  
import numpy as np
  
# Initialising numpy array
ini_array = np.array([[1.3, 2.5, 3.6, np.nan],
                      [2.6, 3.3, np.nan, 5.5],
                      [2.1, 3.2, 5.4, 6.5]])
  
# printing initial array
print ("initial array", ini_array)
  
# indices where values is nan in array
indices = np.where(np.isnan(ini_array))
  
# Iterating over numpy array to replace nan with values
for row, col in zip(*indices):
    ini_array[row, col] = np.mean(ini_array[
           ~np.isnan(ini_array[:, col]), col])
  
# printing final array
print ("final array", ini_array)

chevron_right


Output:

initial array [[ 1.3  2.5  3.6  nan]
 [ 2.6  3.3  nan  5.5]
 [ 2.1  3.2  5.4  6.5]]
final array [[ 1.3  2.5  3.6  6. ]
 [ 2.6  3.3  4.5  5.5]
 [ 2.1  3.2  5.4  6.5]]


My Personal Notes arrow_drop_up

Programming freaktech Enthusiast and have interest in learning new upcoming technologies

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.




Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.