Related Articles
Python | Replace NaN values with average of columns
• Last Updated : 20 Mar, 2019

In machine learning and data analytics data visualization is one of the most important steps. Cleaning and arranging data is done by different algorithms. Sometimes in data sets, we get NaN (not a number) values which are not possible to use for data visualization.

To solve this problem, one possible method is to replace nan values with an average of columns. Given below are a few methods to solve this problem.

Method #1: Using `np.colmean` and `np.take`

 `# Python code to demonstrate``# to replace nan values``# with an average of columns`` ` `import` `numpy as np`` ` `# Initialising numpy array``ini_array ``=` `np.array([[``1.3``, ``2.5``, ``3.6``, np.nan], ``                      ``[``2.6``, ``3.3``, np.nan, ``5.5``],``                      ``[``2.1``, ``3.2``, ``5.4``, ``6.5``]])`` ` `# printing initial array``print` `(``"initial array"``, ini_array)`` ` `# column mean``col_mean ``=` `np.nanmean(ini_array, axis ``=` `0``)`` ` `# printing column mean``print` `(``"columns mean"``, ``str``(col_mean))`` ` `# find indices where nan value is present``inds ``=` `np.where(np.isnan(ini_array))`` ` `# replace inds with avg of column``ini_array[inds] ``=` `np.take(col_mean, inds[``1``])`` ` `# printing final array``print` `(``"final array"``, ini_array)`

Output:

```initial array [[ 1.3  2.5  3.6  nan]
[ 2.6  3.3  nan  5.5]
[ 2.1  3.2  5.4  6.5]]
columns mean [ 2.   3.   4.5  6. ]

final array [[ 1.3  2.5  3.6  6. ]
[ 2.6  3.3  4.5  5.5]
[ 2.1  3.2  5.4  6.5]]
```

Method #2: Using `np.ma` and `np.where`

 `# Python code to demonstrate``# to replace nan values``# with average of columns`` ` `import` `numpy as np`` ` `# Initialising numpy array``ini_array ``=` `np.array([[``1.3``, ``2.5``, ``3.6``, np.nan],``                      ``[``2.6``, ``3.3``, np.nan, ``5.5``],``                      ``[``2.1``, ``3.2``, ``5.4``, ``6.5``]])`` ` `# printing initial array``print` `(``"initial array"``, ini_array)`` ` `# replace nan with col means``res ``=` `np.where(np.isnan(ini_array), np.ma.array(ini_array,``               ``mask ``=` `np.isnan(ini_array)).mean(axis ``=` `0``), ini_array)   `` ` `# printing final array``print` `(``"final array"``, res)`

Output:

```initial array [[ 1.3  2.5  3.6  nan]
[ 2.6  3.3  nan  5.5]
[ 2.1  3.2  5.4  6.5]]
final array [[ 1.3  2.5  3.6  6. ]
[ 2.6  3.3  4.5  5.5]
[ 2.1  3.2  5.4  6.5]]
```

Method #3: Using Naive and `zip`

 `# Python code to demonstrate``# to replace nan values``# with average of columns`` ` `import` `numpy as np`` ` `# Initialising numpy array``ini_array ``=` `np.array([[``1.3``, ``2.5``, ``3.6``, np.nan],``                      ``[``2.6``, ``3.3``, np.nan, ``5.5``],``                      ``[``2.1``, ``3.2``, ``5.4``, ``6.5``]])`` ` `# printing initial array``print` `(``"initial array"``, ini_array)`` ` `# indices where values is nan in array``indices ``=` `np.where(np.isnan(ini_array))`` ` `# Iterating over numpy array to replace nan with values``for` `row, col ``in` `zip``(``*``indices):``    ``ini_array[row, col] ``=` `np.mean(ini_array[``           ``~np.isnan(ini_array[:, col]), col])`` ` `# printing final array``print` `(``"final array"``, ini_array)`

Output:

```initial array [[ 1.3  2.5  3.6  nan]
[ 2.6  3.3  nan  5.5]
[ 2.1  3.2  5.4  6.5]]
final array [[ 1.3  2.5  3.6  6. ]
[ 2.6  3.3  4.5  5.5]
[ 2.1  3.2  5.4  6.5]]
```

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.

My Personal Notes arrow_drop_up