Open In App

Pandas – Strip whitespace from Entire DataFrame

Improve
Improve
Like Article
Like
Save
Share
Report

“We can have data without information, but we cannot have information without data.”  How beautiful this quote is. Data is the backbone of a Data Scientist and according to a survey, data scientist spends approximately 60% of their time in Cleaning and Organizing Data, so it’s our responsibility to become familiar with different techniques to organize the data in a better way.

In this article, we will learn about different methods to remove the extra strip whitespace from the entire DataFrame. The dataset used here is given below:

Remove Whitespace in Pandas

Creating Sample Pandas DataFrame.

Python3




# importing library
import pandas as pd
 
# Creating dataframe
df = pd.DataFrame({'Names' : [' Sunny','Bunny','Ginny ',' Binny ',' Chinni','Minni'],
                    'Age' : [23,44,23,54,22,11],
                    'Blood Group' : [' A+',' B+','O+','O-',' A-','B-'],
                   'Gender' : [' M',' M','F','F','F',' F']
                  })
print(df)


Output:

In the above figure, we are observing that inside Name, Age, Blood Group, and Gender columns, data is in an irregular manner. In most of the cells of a particular column, extra whitespace are present in the leading part of the values

Pandas – Strip whitespace from Entire DataFrame

Our aim is to remove all the extra whitespace and organize it in a systematic way. We will use different methods which will help us to remove all the extra space from the cell’s. Different methods are : 

  • Using Strip() function
  • Using Skipinitialspace
  • Using replace function
  • Using Converters

Strip whitespace from Entire DataFrame Using Strip() function

Pandas provide predefine method “pandas.Series.str.strip()” to remove the whitespace from the string. Using strip function we can easily remove extra whitespace from leading and trailing whitespace from starting. It returns a series or index of an object. It takes set of characters that we want to remove from head and tail of string(leading and trailing character’s). By default, it is none and if we do not pass any characters then it will remove leading and trailing whitespace from the string. It returns a series or index of an object.

Syntax: pandas.Series.str.strip(to_strip = None)

Explanation: It takes set of characters that we want to remove from head and tail of string(leading and trailing character’s).

Parameter: By default it is none and if we do not pass any characters then it will remove leading and trailing whitespace from the string. It returns series or index of object. 

In this example, we code creates a pandas DataFrame named ‘df’ with columns ‘Names’, ‘Age’, ‘Blood Group’, and ‘Gender’. It attempts to remove leading and trailing spaces from the ‘Names’, ‘Blood Group’, and ‘Gender’ columns using the strip() function, but the changes are not applied to the DataFrame; to achieve that, the code should assign the stripped values back to the respective columns like df['Names'] = df['Names'].str.strip().

Python3




# importing library
import pandas as pd
 
# Creating dataframe
df = pd.DataFrame({'Names': [' Sunny', 'Bunny', 'Ginny ', ' Binny ', ' Chinni', 'Minni'],
                   'Age': [23, 44, 23, 54, 22, 11],
                   'Blood Group': [' A+', ' B+', 'O+', 'O-', ' A-', 'B-'],
                   'Gender': [' M', ' M', 'F', 'F', 'F', ' F']
                   })
 
# As dataset having lot of extra spaces in cell so lets remove them using strip() function
df['Names'].str.strip()
df['Blood Group'].str.strip()
df['Gender'].str.strip()
 
# Printing dataframe
print(df)


 Output: 

Remove Space from Columns in Pandas Using Skipinitialspace

It is not any method but it is one of the parameters present inside read_csv() method present in Pandas. Inside pandas.read_csv() method skipinitialspace parameter is present using which we can skip initial space present in our whole dataframe. By default, it is False, make it True to remove extra space. 

Syntax : pandas.read_csv(‘path_of_csv_file’, skipinitialspace = True)

 # By default value of skipinitialspace is False, make it True to use this parameter. 

In this example, we will use Skipinitialspace to strip whitespace from entire DataFrame. Here, we uses the pandas library to read a CSV file named ‘student_data.csv’ and employs the skipinitialspace=True parameter to eliminate leading spaces in the data while loading it into a DataFrame. Finally, it prints the contents of the DataFrame.

Python3




# importing library
import pandas as pd
 
# reading csv file and at a same time using skipinitial attribute which will remove extra space
df = pd.read_csv('\\student_data.csv', skipinitialspace=True)
 
# printing dataset
print(df)


Output: 

Strip whitespace from Entire DataFrame Using replace function

Using replace() function also we can remove extra whitespace from the dataframe. Pandas provide predefine method “pandas.Series.str.replace()” to remove whitespace. Its program will be same as strip() method program only one difference is that here we will use replace function at the place of strip().

Syntax:

Syntax: pandas.Series.str.replace(‘ ‘, ”)

In this example, we are using replace() function to strip whitespace from entire dataframe. The code attempts to remove spaces within the ‘Names’, ‘Blood Group’, and ‘Gender’ columns of a pandas DataFrame named ‘df’ using the str.replace(' ', '') method, but it does not modify the original DataFrame. To apply the changes, the code should assign the modified values back to the respective columns, like df['Names'] = df['Names'].str.replace(' ', '').

Python3




# importing library
import pandas as pd
 
# Creating dataframe
df = pd.DataFrame({'Name': [' Sunny', 'Bunny', 'Ginny ', ' Binny ', ' Chinni', 'Minni'],
                   'Age': [23, 44, 23, 54, 22, 11],
                   'Blood Group': [' A+', ' B+', 'O+', 'O-', ' A-', 'B-'],
                   'Gender': [' M', ' M', 'F', 'F', 'F', ' F']
                   })
 
# As dataset having lot of extra spaces in cell so lets remove them using strip() function
df['Names'].str.replace(' ', '')
df['Blood Group'].str.replace(' ', '')
df['Gender'].str.replace(' ', '')
 
# Printing dataframe
print(df)


Output: 

Remove Space from Columns in Pandas Using Converters

It is similar as skipinitialspace, it is one of the parameter present inside pandas predefine method name “read_csv”. It is used to apply different functions on particular columns. We have to pass functions in the dictionary. Here we will pass strip() function directly which will remove the extra space during reading csv file.

Syntax : pd.read_csv(“path_of_file”, converters={‘column_names’: function_name})

# Pass dict of functions and column names, where column names act as unique keys and function as value.    

In this example, we are using converters. The code reads a CSV file named ‘student_data.csv’ into a pandas DataFrame, and it uses the converters attribute to apply the str.strip() function to remove leading and trailing spaces for the ‘Name’, ‘Blood Group’, and ‘Gender’ columns while loading the data. Finally, it prints the contents of the DataFrame.

Python3




# importing library
import pandas as pd
 
# reading csv file and at a same time using converters attribute which will remove extra space
df = pd.read_csv('\\student_data.csv', converters={'Name': str.strip(),
                                                   'Blood Group': str.strip(),
                                                   'Gender': str.strip()})
 
# printing dataset
print(df)


Output: 

Removing Extra Whitespace from Whole DataFrame

The code defines a pandas DataFrame named ‘df’ with columns ‘Names’, ‘Age’, ‘Blood_Group’, and ‘Gender’. It also includes a function called whitespace_remover that iterates over the columns of a given DataFrame, checks if the data type is ‘object’, and applies the strip function to remove leading and trailing whitespaces. Finally, the function is called on the DataFrame ‘df’, and the modified DataFrame is printed.

Python3




# Importing required libraries
import pandas as pd
 
# Creating DataFrame having 4 columns and but
# the data is in unregularized way.
df = pd.DataFrame({'Names': [' Sunny', 'Bunny', 'Ginny ',
                             ' Binny ', ' Chinni', 'Minni'],
 
                   'Age': [23, 44, 23, 54, 22, 11],
 
                   'Blood_Group': [' A+', ' B+', 'O+', 'O-',
                                   ' A-', 'B-'],
 
                   'Gender': [' M', ' M', 'F', 'F', 'F', ' F']
                   })
 
# Creating a function which will remove extra leading
# and tailing whitespace from the data.
# pass dataframe as a parameter here
 
 
def whitespace_remover(dataframe):
 
    # iterating over the columns
    for i in dataframe.columns:
 
        # checking datatype of each columns
        if dataframe[i].dtype == 'object':
 
            # applying strip function on column
            dataframe[i] = dataframe[i].map(str.strip)
        else:
 
            # if condn. is False then it will do nothing.
            pass
 
 
# applying whitespace_remover function on dataframe
whitespace_remover(df)
 
# printing dataframe
print(df)


Output



Last Updated : 01 Dec, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads