Skip to content
Related Articles

Related Articles

Pandas – Strip whitespace from Entire DataFrame
  • Last Updated : 15 Mar, 2021

“We can have data without information, but we cannot have information without data.”  How beautiful this quote is. Data is backbone of Data Scientist and according to a survey data scientist spends approx 60% of time in Cleaning and Organizing Data, so it’s our responsibility to make us familiar with different techniques to organize the data in a better way. In this article, we will learn about different methods to remove extra strip whitespace from the entire DataFrame. The dataset used here is given below:

In the above figure, we are observing that inside Name, Age, Blood Group, and Gender columns, data is in an irregular manner. In most of the cells of a particular column, extra whitespace are present in the leading part of the values. So our aim is to remove all the extra whitespace and organize it in a systematic way. We will use different methods which will help us to remove all the extra space from the cell’s. Different methods are : 

Using Strip() function
Using Skipinitialspace 
Using replace function 
Using Converters

Different methods to remove extra whitespace

Method 1: Using Strip() function : 

Pandas provide predefine method “pandas.Series.str.strip()” to remove the whitespace from the string. Using strip function we can easily remove extra whitespace from leading and trailing whitespace from staring. It returns a series or index of an object. It takes set of characters that we want to remove from head and tail of string(leading and trailing character’s). By default, it is none and if we do not pass any characters then it will remove leading and trailing whitespace from the string. It returns a series or index of an object.



Syntax: pandas.Series.str.strip(to_strip = None)

Explanation: It takes set of characters that we want to remove from head and tail of string(leading and trailing character’s).

Parameter: By default it is none and if we do not pass any characters then it will remove leading and trailing whitespace from the string. It returns series or index of object. 

Example : 

Python3




# importing library
import pandas as pd
  
# Creating dataframe
df = pd.DataFrame({'Names' : [' Sunny','Bunny','Ginny ',' Binny ',' Chinni','Minni'], 
                    'Age' : [23,44,23,54,22,11],
                    'Blood Group' : [' A+',' B+','O+','O-',' A-','B-'],
                   'Gender' : [' M',' M','F','F','F',' F']
                  })
  
# As dataset having lot of extra spaces in cell so lets remove them using strip() function
df['Names'].str.strip()
df['Bolld Group'].str.strip()
df['Gender'].str.strip()
  
# Printing dataframe
print(df)

Output: 

Method 2: Using Skipinitialspace : 



It is not any method but it is one of the parameters present inside read_csv() method present in Pandas. Inside pandas.read_csv() method skipinitialspace parameter is present using which we can skip initial space present in our whole dataframe. By default, it is False, make it True to remove extra space.

Syntax : pandas.read_csv(‘path_of_csv_file’, skipinitialspace = True)

 # By default value of skipinitialspace is False, make it True to use this parameter.

Example : 

Python3




# importing library
import pandas as pd
  
# reading csv file and at a same time using skipinitial attribute which will remobe extra space 
df = pd.read_csv('\\student_data.csv', skipinitialspace = True)
  
# printing dataset
print(df)

Output: 

Method 3: Using replace function : 

Using replace() function also we can remove extra whitespace from the dataframe. Pandas provide predefine method “pandas.Series.str.replace()” to remove whitespace. Its program will be same as strip() method program only one difference is that here we will use replace function at the place of strip().

Syntax : pandas.Series.str.replace(' ', '')

Example : 

Python3




# importing library
import pandas as pd
  
# Creating dataframe
df = pd.DataFrame({'Name' : [' Sunny','Bunny','Ginny ',' Binny ',' Chinni','Minni'], 
                    'Age' : [23,44,23,54,22,11],
                    'Blood Group' : [' A+',' B+','O+','O-',' A-','B-'],
                   'Gender' : [' M',' M','F','F','F',' F']
                  })
  
# As dataset having lot of extra spaces in cell so lets remove them using strip() function
df['Names'].str.replace(' ', '')
df['Bolld Group'].str.replace(' ', '')
df['Gender'].str.replace(' ', '')
  
# Printing dataframe
print(df)

Output: 

Method 4: Using Converters :

It is similar as skipinitialspace, it is one of the parameter present inside pandas predefine method name “read_csv”. It is used to apply different functions on particular columns. We have to pass functions in the dictionary. Here we will pass strip() function directly which will remove the extra space during reading csv file.

Syntax : pd.read_csv(“path_of_file”, converters={‘column_names’: function_name})

# Pass dict of functions and column names, where column names act as unique keys and function as value.    

Example : 

Python3




# importing library
import pandas as pd
  
# reading csv file and at a same time using converters attribute which will remove extra space 
df = pd.read_csv('\\student_data.csv', converters={'Name': str.strip(),
                                                'Blood Group' : str.strip(),
                                                'Gender' : str.strip() } )
  
# printing dataset
print(df)

Output: 

Removing Extra Whitespace from Whole DataFrame by Creating some code : 

Python3




# Importing required libraries
import pandas as pd
  
# Creating DataFrame having 4 columns and but 
# the data is in unregularized way.
df = pd.DataFrame({'Names': [' Sunny', 'Bunny', 'Ginny ',
                             ' Binny ', ' Chinni', 'Minni'],
                     
                   'Age': [23, 44, 23, 54, 22, 11],
                     
                   'Blood_Group': [' A+', ' B+', 'O+', 'O-'
                                   ' A-', 'B-'],
                     
                   'Gender': [' M', ' M', 'F', 'F', 'F', ' F']
                   })
  
  
# Creating a function which will remove extra leading 
# and tailing whitespace from the data.
# pass dataframe as a parameter here
def whitespace_remover(dataframe):
    
    # iterating over the columns
    for i in dataframe.columns:
          
        # checking datatype of each columns
        if dataframe[i].dtype == 'object':
              
            # applying strip function on column
            dataframe[i] = dataframe[i].map(str.strip)
        else:
              
            # if condn. is False then it will do nothing.
            pass
  
# applying whitespace_remover function on dataframe
whitespace_remover(df)
  
# printing dataframe
print(df)

In the above code snippet in first line we import required libraries, here pandas is used to perform  read, write and many other operation on data, then we created a DataFrame using pandas having 4 columns ‘Names’, ‘Age’, ‘Blood_Group’ and ‘Gender’. Almost all columns having irregular data. Now the major part begin from here, we created a function which will remove extra leading and trailing whitespace from the data. This function taking dataframe as a parameter and checking datatype of each column and if datatype of column is ‘Object’ then apply strip function which is predefined in pandas library on that column else it will do nothing. Then in next line we apply whitespace_remover() function on the dataframe which successfully remove the extra whitespace from the columns.

Output: 

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course

My Personal Notes arrow_drop_up
Recommended Articles
Page :