Skip to content
Related Articles

Related Articles

ML | Handle Missing Data with Simple Imputer

View Discussion
Improve Article
Save Article
  • Difficulty Level : Easy
  • Last Updated : 28 Sep, 2021
View Discussion
Improve Article
Save Article

SimpleImputer is a scikit-learn class which is helpful in handling the missing data in the predictive model dataset. It replaces the NaN values with a specified placeholder. 
It is implemented by the use of the SimpleImputer() method which takes the following arguments :

missing_values : The missing_values placeholder which has to be imputed. By default is NaN 
strategy : The data which will replace the NaN values from the dataset. The strategy argument can take the values – ‘mean'(default), ‘median’, ‘most_frequent’ and ‘constant’. 
fill_value : The constant value to be given to the NaN data using the constant strategy. 

Code: Python code illustrating the use of SimpleImputer class.


import numpy as np
# Importing the SimpleImputer class
from sklearn.impute import SimpleImputer
# Imputer object using the mean strategy and
# missing_values type for imputation
imputer = SimpleImputer(missing_values = np.nan,
                        strategy ='mean')
data = [[12, np.nan, 34], [10, 32, np.nan],
        [np.nan, 11, 20]]
print("Original Data : \n", data)
# Fitting the data to the imputer object
imputer =
# Imputing the data    
data = imputer.transform(data)
print("Imputed Data : \n", data)


Original Data : 

[[12, nan, 34]
[10, 32, nan]
[nan, 11, 20]]

Imputed Data : 

[[12, 21.5, 34]
[10, 32, 27]
[11, 11, 20]]

Remember: The mean or median is taken along the column of the matrix

My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!