How to Drop Rows with NaN Values in Pandas DataFrame?

NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float. NaN value is one of the major problems in Data Analysis. It is very essential to deal with NaN in order to get the desired results. In this article, we will discuss how to drop rows with NaN values.

What are NaN values?

NaN (Not a Number) is a unique floating-point value that is frequently used to indicate missing, undefined or unrepresentable results in numerical computations.

Why remove NaN values?

Data integrity is paramount in any analytical endeavor, and NaNs pose a threat to the seamless flow of data analysis and computations.

NaNs can disrupt data analysis and computations.
Algorithm Compatibility
NaNs can affect data visualization.
They can lead to errors in machine learning model training.

How to remove NaN values in Python pandas?

There are various ways to get rid of NaN values from dataset using Python pandas. The most popular techniques are:

dropna(): eliminates columns and rows containing NaN values.
fillna(value): Fills NaN values with the specified value..
interpolate(): interpolates values to fill in NaN values

Using dropna()

We can drop Rows having NaN Values in Pandas DataFrame by using dropna() function

 df.dropna()

It is also possible to drop rows with NaN values with regard to particular columns using the following statement:

df.dropna(subset, inplace=True)

With in place set to True and subset set to a list of column names to drop all rows with NaN under those columns.

Let’s make our own Dataframe and remove the rows with NaN values so that we can clean data.

Python3

import pandas as pd

import numpy as np
 
data = pd.DataFrame({'A': [1, 2, np.nan, 4], 'B': [5,6, 7, 8], 'C' : [10, 11, 12, np.nan],'D' : [21, 22, 23, 24]})

print(data)

Output:

     A  B     C   D
0  1.0  5  10.0  21
1  2.0  6  11.0  22
2  NaN  7  12.0  23
3  4.0  8   NaN  24

Python3

data = data.dropna() # drop rows with nan values

print(data)

Output:

     A  B     C   D
0  1.0  5  10.0  21
1  2.0  6  11.0  22

Using fillna()

We can use the fillna() method to replace NaN values in a DataFrame.

df = df.fillna()

Python3

import pandas as pd

import numpy as np
 
car = pd.DataFrame({'Year of Launch': [1999, np.nan, 1986, 2020, np.nan,

                          1991],

       'Engine Number': [np.nan, 15, 22, 43, 44, np.nan],

       'Chasis Unique Id': [4023, np.nan, 3115, 4522, 3643,

                            3774]})
car

Output:

    Year of Launch    Engine Number    Chasis Unique Id
0    1999.0    NaN    4023.0
1    NaN    15.0    NaN
2    1986.0    22.0    3115.0
3    2020.0    43.0    4522.0
4    NaN    44.0    3643.0
5    1991.0    NaN    3774.0

Python3

car_filled = car.fillna(0)
car_filled

Output:

    Year of Launch    Engine Number    Chasis Unique Id
0    1999.0    0.0    4023.0
1    0.0    15.0    0.0
2    1986.0    22.0    3115.0
3    2020.0    43.0    4522.0
4    0.0    44.0    3643.0
5    1991.0    0.0    3774.0

All nan values has been replaced by 0.

Using Interpolate()

It estimates and fills missing values by linearly interpolating between neighboring data points, creating a smoother dataset. It is particularly useful for time series data. Use df.interpolate( ) to perform and replace NaN values with interpolated values in-place.

Python3

import pandas as pd

import numpy as np
 
dit = pd.DataFrame({'August': [32, 34, 4.85, 71.2, 1.1],

       'September': [54, 68, 9.25, np.nan, 0.9],

       'October': [ 5.8, 8.52, np.nan, 1.6, 11], 

       'November': [ 5.8, 50, 8.9, 77, 78]})
dit

Output:

    August    September    October    November
0    32.00    54.00    5.80    5.8
1    34.00    68.00    8.52    50.0
2    4.85    9.25    NaN    8.9
3    71.20    NaN    1.60    77.0
4    1.10    0.90    11.00    78.0

Python3

dit=dit.interpolate()
dit

Output:

    August    September    October    November
0    32.00    54.000    5.80    5.8
1    34.00    68.000    8.52    50.0
2    4.85    9.250    5.06    8.9
3    71.20    5.075    1.60    77.0
4    1.10    0.900    11.00    78.0

Conclusion

Dealing with NaN values is a crucial aspect of data analysis, as these values can significantly impact the integrity of analytical results. In this article, we discussed the concept of NaN (Not a Number) values, which are often used to indicate missing or undefined results in numerical computations.

Frequently Asked Questions(FAQs)

1.How to replace NaN with no value in pandas?

In Pandas, use `df.fillna(‘No Value’, inplace=True)` to replace NaN with ‘No Value’ in the DataFrame.

2.How do I drop NaN values in a list?

In Python, use list = [x for x in list if str(x) != 'nan'] to drop NaN values from a list.

3.How to replace NaN in numpy list?

For a NumPy list, use np.nan_to_num(array) to replace NaN values with zeros.

Article Tags :

Python

Python pandas-dataFrame

Python-pandas