Related Articles

# Python | Imputation using the KNNimputer()

• Last Updated : 05 Sep, 2020

KNNimputer is a scikit-learn class used to fill out or predict the missing values in a dataset. It is a more useful method which works on the basic approach of the KNN algorithm rather than the naive approach of filling all the values with mean or the median. In this approach, we specify a distance from the missing values which is also known as the K parameter. The missing value will be predicted in reference to the mean of the neighbours.

It is implemented by the KNNimputer() method which contains the following arguments:

n_neighbors: number of data points to include closer to the missing value.
metric: the distance metric to be used for searching.
values – {nan_euclidean. callable} by default – nan_euclidean
weights: to determine on what basis should the neighboring values be treated
values -{uniform , distance, callable} by default- uniform.

Code: Python code to illustrate KNNimputor class

 `# import necessary libraries``import` `numpy as np``import` `pandas as pd`` ` `# import the KNNimputer class``from` `sklearn.impute ``import` `KNNImputer`` ` ` ` `# create dataset for marks of a student``dict` `=` `{``'Maths'``:[``80``, ``90``, np.nan, ``95``], ``        ``'Chemistry'``: [``60``, ``65``, ``56``, np.nan], ``        ``'Physics'``:[np.nan, ``57``, ``80``, ``78``],``       ``'Biology'` `: [``78``,``83``,``67``,np.nan]}`` ` `# creating a data frame from the list ``Before_imputation ``=` `pd.DataFrame(``dict``)``#print dataset before imputaion``print``(``"Data Before performing imputation\n"``,Before_imputation)`` ` `# create an object for KNNImputer``imputer ``=` `KNNImputer(n_neighbors``=``2``)``After_imputation ``=` `imputer.fit_transform(Before_imputation)``# print dataset after performing the operation``print``(``"\n\nAfter performing imputation\n"``,After_imputation)`

Output:

```Data Before performing imputation
Maths  Chemistry  Physics  Biology
0   80.0       60.0      NaN     78.0
1   90.0       65.0     57.0     83.0
2    NaN       56.0     80.0     67.0
3   95.0        NaN     78.0      NaN

After performing imputation
[[80.  60.  68.5 78. ]
[90.  65.  57.  83. ]
[87.5 56.  80.  67. ]
[95.  58.  78.  72.5]]
```

Note: After transforming the data becomes a numpy array.

Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.

My Personal Notes arrow_drop_up