ML | Handle Missing Data with Simple Imputer
SimpleImputer is a scikit-learn class which is helpful in handling the missing data in the predictive model dataset. It replaces the NaN values with a specified placeholder.
It is implemented by the use of the SimpleImputer() method which takes the following arguments :
missing_values : The missing_values placeholder which has to be imputed. By default is NaN
stategy : The data which will replace the NaN values from the dataset. The strategy argument can take the values – ‘mean'(default), ‘median’, ‘most_frequent’ and ‘constant’.
fill_value : The constant value to be given to the NaN data using the constant strategy.
Code: Python code illustrating the use of SimpleImputer class.
Original Data :
[[12, nan, 34] [10, 32, nan] [nan, 11, 20]]
Imputed Data :
[[12, 21.5, 34] [10, 32, 27] [11, 11, 20]]
Remember: The mean or median is taken along the column of the matrix
Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.