Open In App
Related Articles

How to Drop Rows with NaN Values in Pandas DataFrame?

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Report issue
Report

NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float. NaN value is one of the major problems in Data Analysis. It is very essential to deal with NaN in order to get the desired results. In this article, we will discuss how to drop rows with NaN values.

What are NaN values?

NaN (Not a Number) is a unique floating-point value that is frequently used to indicate missing, undefined or unrepresentable results in numerical computations.

Why remove NaN values?

Data integrity is paramount in any analytical endeavor, and NaNs pose a threat to the seamless flow of data analysis and computations.

  • NaNs can disrupt data analysis and computations.
  • Algorithm Compatibility
  • NaNs can affect data visualization.
  • They can lead to errors in machine learning model training.

How to remove NaN values in Python pandas?

There are various ways to get rid of NaN values from dataset using Python pandas. The most popular techniques are:

  • dropna(): eliminates columns and rows containing NaN values.
  • fillna(value): Fills NaN values with the specified value..
  • interpolate(): interpolates values to fill in NaN values

Using dropna()

We can drop Rows having NaN Values in Pandas DataFrame by using dropna() function 

 df.dropna() 

It is also possible to drop rows with NaN values with regard to particular columns using the following statement:

df.dropna(subset, inplace=True)

With in place set to True and subset set to a list of column names to drop all rows with NaN under those columns.

Let’s make our own Dataframe and remove the rows with NaN values so that we can clean data.

Python3

import pandas as pd
import numpy as np
 
data = pd.DataFrame({'A': [1, 2, np.nan, 4], 'B': [5,6, 7, 8], 'C' : [10, 11, 12, np.nan],'D' : [21, 22, 23, 24]})
print(data)

                    

Output:

     A  B     C   D
0 1.0 5 10.0 21
1 2.0 6 11.0 22
2 NaN 7 12.0 23
3 4.0 8 NaN 24


Python3

data = data.dropna() # drop rows with nan values
print(data)

                    

Output:

     A  B     C   D
0 1.0 5 10.0 21
1 2.0 6 11.0 22

Using fillna()

We can use the fillna() method to replace NaN values in a DataFrame.  

df = df.fillna()

Python3

import pandas as pd
import numpy as np
 
car = pd.DataFrame({'Year of Launch': [1999, np.nan, 1986, 2020, np.nan,
                          1991],
       'Engine Number': [np.nan, 15, 22, 43, 44, np.nan],
       'Chasis Unique Id': [4023, np.nan, 3115, 4522, 3643,
                            3774]})
car

                    

Output:

    Year of Launch    Engine Number    Chasis Unique Id
0 1999.0 NaN 4023.0
1 NaN 15.0 NaN
2 1986.0 22.0 3115.0
3 2020.0 43.0 4522.0
4 NaN 44.0 3643.0
5 1991.0 NaN 3774.0

Python3

car_filled = car.fillna(0)
car_filled

                    

Output:

    Year of Launch    Engine Number    Chasis Unique Id
0 1999.0 0.0 4023.0
1 0.0 15.0 0.0
2 1986.0 22.0 3115.0
3 2020.0 43.0 4522.0
4 0.0 44.0 3643.0
5 1991.0 0.0 3774.0

All nan values has been replaced by 0.

Using Interpolate()

It estimates and fills missing values by linearly interpolating between neighboring data points, creating a smoother dataset. It is particularly useful for time series data. Use df.interpolate( ) to perform and replace NaN values with interpolated values in-place.

Python3

import pandas as pd
import numpy as np
 
dit = pd.DataFrame({'August': [32, 34, 4.85, 71.2, 1.1],
       'September': [54, 68, 9.25, np.nan, 0.9],
       'October': [ 5.8, 8.52, np.nan, 1.6, 11],
       'November': [ 5.8, 50, 8.9, 77, 78]})
dit

                    

Output:

    August    September    October    November
0 32.00 54.00 5.80 5.8
1 34.00 68.00 8.52 50.0
2 4.85 9.25 NaN 8.9
3 71.20 NaN 1.60 77.0
4 1.10 0.90 11.00 78.0

Python3

dit=dit.interpolate()
dit

                    

Output:

    August    September    October    November
0 32.00 54.000 5.80 5.8
1 34.00 68.000 8.52 50.0
2 4.85 9.250 5.06 8.9
3 71.20 5.075 1.60 77.0
4 1.10 0.900 11.00 78.0

Conclusion

Dealing with NaN values is a crucial aspect of data analysis, as these values can significantly impact the integrity of analytical results. In this article, we discussed the concept of NaN (Not a Number) values, which are often used to indicate missing or undefined results in numerical computations.

Frequently Asked Questions(FAQs)

1.How to replace NaN with no value in pandas?

In Pandas, use `df.fillna(‘No Value’, inplace=True)` to replace NaN with ‘No Value’ in the DataFrame.

2.How do I drop NaN values in a list?

In Python, use list = [x for x in list if str(x) != 'nan'] to drop NaN values from a list.

3.How to replace NaN in numpy list?

For a NumPy list, use np.nan_to_num(array) to replace NaN values with zeros.



Last Updated : 21 Dec, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads