Open In App

Drop rows from Pandas dataframe with missing values or NaN in columns

Last Updated : 02 Jul, 2020
Improve
Improve
Like Article
Like
Save
Share
Report

Pandas provides various data structures and operations for manipulating numerical data and time series. However, there can be cases where some data might be missing. In Pandas missing data is represented by two value:

  • None: None is a Python singleton object that is often used for missing data in Python code.
  • NaN: NaN (an acronym for Not a Number), is a special floating-point value recognized by all systems that use the standard IEEE floating-point representation

Pandas treat None and NaN as essentially interchangeable for indicating missing or null values. In order to drop a null values from a dataframe, we used dropna() function this function drop Rows/Columns of datasets with Null values in different ways.

Syntax:
DataFrame.dropna(axis=0, how=’any’, thresh=None, subset=None, inplace=False)

Parameters:
axis: axis takes int or string value for rows/columns. Input can be 0 or 1 for Integer and ‘index’ or ‘columns’ for String.
how: how takes string value of two kinds only (‘any’ or ‘all’). ‘any’ drops the row/column if ANY value is Null and ‘all’ drops only if ALL values are null.
thresh: thresh takes integer value which tells minimum amount of na values to drop.
subset: It’s an array which limits the dropping process to passed rows/columns through list.
inplace: It is a boolean which makes the changes in data frame itself if True.

Code #1: Dropping rows with at least 1 null value.




# importing pandas as pd
import pandas as pd
  
# importing numpy as np
import numpy as np
  
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, 40, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
  
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
    
df



Now we drop rows with at least one Nan value (Null value)




# importing pandas as pd
import pandas as pd
  
# importing numpy as np
import numpy as np
  
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, 40, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
  
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
  
# using dropna() function  
df.dropna()


Output:

Code #2: Dropping rows if all values in that row are missing.




# importing pandas as pd
import pandas as pd
  
# importing numpy as np
import numpy as np
  
# dictionary of lists
dict = {'First Score':[100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, np.nan, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
  
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
    
df



Now we drop a rows whose all data is missing or contain null values(NaN)




# importing pandas as pd
import pandas as pd
  
# importing numpy as np
import numpy as np
  
# dictionary of lists
dict = {'First Score':[100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, np.nan, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
  
df = pd.DataFrame(dict)
  
# using dropna() function    
df.dropna(how = 'all')


Output:

Code #3: Dropping columns with at least 1 null value.




# importing pandas as pd
import pandas as pd
   
# importing numpy as np
import numpy as np
   
# dictionary of lists
dict = {'First Score':[100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, np.nan, 80, 98],
        'Fourth Score':[60, 67, 68, 65]}
  
# creating a dataframe from dictionary 
df = pd.DataFrame(dict)
     
df



Now we drop a columns which have at least 1 missing values




# importing pandas as pd
import pandas as pd
   
# importing numpy as np
import numpy as np
   
# dictionary of lists
dict = {'First Score':[100, np.nan, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, np.nan, 80, 98],
        'Fourth Score':[60, 67, 68, 65]}
  
# creating a dataframe from dictionary  
df = pd.DataFrame(dict)
  
# using dropna() function     
df.dropna(axis = 1)


Output :

 

Code #4: Dropping Rows with at least 1 null value in CSV file.

Note: In this, we are using CSV file, to download the CSV file used, Click Here.




# importing pandas module 
import pandas as pd 
    
# making data frame from csv file 
data = pd.read_csv("employees.csv"
    
# making new data frame with dropped NA values 
new_data = data.dropna(axis = 0, how ='any'
    
new_data


Output:

Now we compare sizes of data frames so that we can come to know how many rows had at least 1 Null value




print("Old data frame length:", len(data))
print("New data frame length:", len(new_data)) 
print("Number of rows with at least 1 NA value: ",
      (len(data)-len(new_data)))


Output :

Old data frame length: 1000
New data frame length: 764
Number of rows with at least 1 NA value:  236

Since the difference is 236, there were 236 rows which had at least 1 Null value in any column.



Similar Reads

How to Drop Columns with NaN Values in Pandas DataFrame?
Nan(Not a number) is a floating-point value which can't be converted into other data type expect to float. In data analysis, Nan is the unnecessary value which must be removed in order to analyze the data set properly. In this article, we will discuss how to remove/drop columns having Nan values in the pandas Dataframe. We have a function known as
3 min read
How to Drop Rows with NaN Values in Pandas DataFrame?
NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float. NaN value is one of the major problems in Data Analysis. It is very essential to deal with NaN in order to get the desired results. In this article, we will
4 min read
Count NaN or missing values in Pandas DataFrame
In this article, we will see how to Count NaN or missing values in Pandas DataFrame using isnull() and sum() method of the DataFrame. Dataframe.isnull() method Pandas isnull() function detect missing values in the given object. It return a boolean same-sized object indicating if the values are NA. Missing values gets mapped to True and non-missing
5 min read
Count the NaN values in one or more columns in Pandas DataFrame
Let us see how to count the total number of NaN values in one or more columns in a Pandas DataFrame. In order to count the NaN values in the DataFrame, we are required to assign a dictionary to the DataFrame and that dictionary should contain numpy.nan values which is a NaN(null) value. Consider the following DataFrame. # importing the modules impo
2 min read
Python | Delete rows/columns from DataFrame using Pandas.drop()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages which makes importing and analyzing data much easier. In this article, we will how to delete a row in Excel using Pandas as well as delete a column from DataFrame using Pandas. Pandas Data
4 min read
Python | Visualize missing values (NaN) values using Missingno Library
In the case of a real-world dataset, it is very common that some values in the dataset are missing. We represent these missing values as NaN (Not a Number) values. But to build a good machine learning model our dataset should be complete. That's why we use some imputation techniques to replace the NaN values with some probable values. But before do
3 min read
Ways to Create NaN Values in Pandas DataFrame
Let's discuss ways of creating NaN values in the Pandas Dataframe. There are various ways to create NaN values in Pandas dataFrame. Those are: Using NumPy Importing csv file having blank values Applying to_numeric function Method 1: Using NumPy C/C++ Code import pandas as pd import numpy as np num = {'number': [1,2,np.nan,6,7,np.nan,np.nan]} df = p
1 min read
Replace all the NaN values with Zero's in a column of a Pandas dataframe
Replacing the NaN or the null values in a dataframe can be easily performed using a single line DataFrame.fillna() and DataFrame.replace() method. We will discuss these methods along with an example demonstrating how to use it. DataFrame.fillna(): This method is used to fill null or null values with a specific value. Syntax: DataFrame.fillna(self,
3 min read
Highlight the nan values in Pandas Dataframe
In this article, we will discuss how to highlight the NaN (Not a number) values in Pandas Dataframe. NaN values used to represent NULL values and sometimes it is the result of the mathematical overflow.Lets first make a dataframe: C/C++ Code # Import Required Libraries import pandas as pd import numpy as np # Create a dictionary for the dataframe d
2 min read
Replace NaN Values with Zeros in Pandas DataFrame
NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special floating-point value and cannot be converted to any other type than float. NaN value is one of the major problems in Data Analysis. It is very essential to deal with NaN in order to get the desired results.  [caption width="800"] [/
4 min read
Practice Tags :