ML | Handle Missing Data with Simple Imputer
SimpleImputer is a scikit-learn class which is helpful in handling the missing data in the predictive model dataset. It replaces the NaN values with a specified placeholder.
It is implemented by the use of the SimpleImputer() method which takes the following arguments :
missing_values : The missing_values placeholder which has to be imputed. By default is NaN
strategy : The data which will replace the NaN values from the dataset. The strategy argument can take the values – ‘mean'(default), ‘median’, ‘most_frequent’ and ‘constant’.
fill_value : The constant value to be given to the NaN data using the constant strategy.
Code: Python code illustrating the use of SimpleImputer class.
Python3
import numpy as np
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan,
strategy = 'mean' )
data = [[ 12 , np.nan, 34 ], [ 10 , 32 , np.nan],
[np.nan, 11 , 20 ]]
print ( "Original Data : \n" , data)
imputer = imputer.fit(data)
data = imputer.transform(data)
print ( "Imputed Data : \n" , data)
|
Output
Original Data :
[[12, nan, 34]
[10, 32, nan]
[nan, 11, 20]]
Imputed Data :
[[12, 21.5, 34]
[10, 32, 27]
[11, 11, 20]]
Remember: The mean or median is taken along the column of the matrix
Last Updated :
28 Sep, 2021
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...