Python | Replace NaN values with average of columns
In machine learning and data analytics data visualization is one of the most important steps. Cleaning and arranging data is done by different algorithms. Sometimes in data sets, we get NaN (not a number) values which are not possible to use for data visualization.
To solve this problem, one possible method is to replace nan values with an average of columns. Given below are a few methods to solve this problem.
Method #1: Using np.colmean
and np.take
# Python code to demonstrate # to replace nan values # with an average of columns import numpy as np # Initialising numpy array ini_array = np.array([[ 1.3 , 2.5 , 3.6 , np.nan], [ 2.6 , 3.3 , np.nan, 5.5 ], [ 2.1 , 3.2 , 5.4 , 6.5 ]]) # printing initial array print ( "initial array" , ini_array) # column mean col_mean = np.nanmean(ini_array, axis = 0 ) # printing column mean print ( "columns mean" , str (col_mean)) # find indices where nan value is present inds = np.where(np.isnan(ini_array)) # replace inds with avg of column ini_array[inds] = np.take(col_mean, inds[ 1 ]) # printing final array print ( "final array" , ini_array) |
Output:
initial array [[ 1.3 2.5 3.6 nan] [ 2.6 3.3 nan 5.5] [ 2.1 3.2 5.4 6.5]] columns mean [ 2. 3. 4.5 6. ] final array [[ 1.3 2.5 3.6 6. ] [ 2.6 3.3 4.5 5.5] [ 2.1 3.2 5.4 6.5]]
Method #2: Using np.ma
and np.where
# Python code to demonstrate # to replace nan values # with average of columns import numpy as np # Initialising numpy array ini_array = np.array([[ 1.3 , 2.5 , 3.6 , np.nan], [ 2.6 , 3.3 , np.nan, 5.5 ], [ 2.1 , 3.2 , 5.4 , 6.5 ]]) # printing initial array print ( "initial array" , ini_array) # replace nan with col means res = np.where(np.isnan(ini_array), np.ma.array(ini_array, mask = np.isnan(ini_array)).mean(axis = 0 ), ini_array) # printing final array print ( "final array" , res) |
Output:
initial array [[ 1.3 2.5 3.6 nan] [ 2.6 3.3 nan 5.5] [ 2.1 3.2 5.4 6.5]] final array [[ 1.3 2.5 3.6 6. ] [ 2.6 3.3 4.5 5.5] [ 2.1 3.2 5.4 6.5]]
Method #3: Using Naive and zip
# Python code to demonstrate # to replace nan values # with average of columns import numpy as np # Initialising numpy array ini_array = np.array([[ 1.3 , 2.5 , 3.6 , np.nan], [ 2.6 , 3.3 , np.nan, 5.5 ], [ 2.1 , 3.2 , 5.4 , 6.5 ]]) # printing initial array print ( "initial array" , ini_array) # indices where values is nan in array indices = np.where(np.isnan(ini_array)) # Iterating over numpy array to replace nan with values for row, col in zip ( * indices): ini_array[row, col] = np.mean(ini_array[ ~np.isnan(ini_array[:, col]), col]) # printing final array print ( "final array" , ini_array) |
Output:
initial array [[ 1.3 2.5 3.6 nan] [ 2.6 3.3 nan 5.5] [ 2.1 3.2 5.4 6.5]] final array [[ 1.3 2.5 3.6 6. ] [ 2.6 3.3 4.5 5.5] [ 2.1 3.2 5.4 6.5]]
Please Login to comment...