Pandas remove rows with special characters

Last Updated : 21 Mar, 2024

In this article we will learn how to remove the rows with special characters i.e; if a row contains any value which contains special characters like @, %, &, $, #, +, -, *, /, etc. then drop such row and modify the data. To drop such types of rows, first, we have to search rows having special characters per column and then drop. To search we use regular expression either [@#&$%+-/*] or [^0-9a-zA-Z]. Let’s discuss the whole procedure with some examples :

Example 1:

Python3

# importing package 
import pandas as pd 
  
# load dataset 
df = pd.read_csv("data1.csv") 
  
# view dataset 
print(df)

Output:

Select rows with columns having special characters value

Python3

# select the rows  
# if Name column 
# has special characters 
print(df[df.Name.str.contains(r'[@#&$%+-/*]')])

Output:

Python3

# select the rows  
# if Grade column 
# has special characters 
print(df[df.Grade.str.contains(r'[^0-9a-zA-Z]')])

Output:

Merging of selected rows

Python3

# merge the selected rows 
# by using or 
print(df[df.Name.str.contains(r'[^0-9a-zA-Z]') 
         | df.Grade.str.contains(r'[@#&$%+-/*]')]) 

Output:

Remove the merged selected rows

Python3

# drop the merged selected rows 
print(df.drop(df[df.Name.str.contains(r'[^0-9a-zA-Z]') 
                 | df.Grade.str.contains(r'[^0-9a-zA-Z]')].index)) 

Output:

Example 2: This example uses a dataframe. shown below :

Python3

# importing package 
import pandas as pd 
  
# load dataset 
df = pd.read_csv("data2.csv") 
  
# view dataset 
print(df) 
  
# select and then merge rows  
# with special characters 
print(df[df.ID.str.contains(r'[^0-9a-zA-Z]') |  
         df.Name.str.contains(r'[^0-9a-zA-Z]') |  
         df.Age.str.contains(r'[^0-9a-zA-Z]') |  
         df.Country.str.contains(r'[^0-9a-zA-Z]')]) 
  
# drop the rows 
print(df.drop(df[df.ID.str.contains(r'[^0-9a-zA-Z]') |  
                 df.Name.str.contains(r'[^0-9a-zA-Z]') |  
                 df.Age.str.contains(r'[^0-9a-zA-Z]') |  
                 df.Country.str.contains(r'[^0-9a-zA-Z]')].index))