Open In App

How to Drop Rows that Contain a Specific String in Pandas?

Last Updated : 03 Dec, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

In Pandas, we can drop rows from a DataFrame that contain a specific string in a particular column. In this article, we are going to see how to drop rows that contain a specific string in Pandas.

Eliminating Rows Containing a Specific String

Basically, this function will search for the string in the given column and return the rows respective to that. For this, we need to create a new data frame by filtering the data frame using this function. 

Syntax:

df[ df[ “column” ].str.contains( “someString” )==False ]

Creating a Sample Pandas DataFrame

Here, we will create a sample DataFrame that we will use in further examples.

Python3




# Importing the library
import pandas as pd
 
# Dataframe
df = pd.DataFrame({'team': ['Team 1', 'Team 1', 'Team 2',
                            'Team 3', 'Team 2', 'Team 3'],
                   'Subject': ['Math', 'Science', 'Science',
                               'Math', 'Science', 'Math'],
                   'points': [10, 8, 10, 6, 6, 5]})
 
# display
df


Output:

     team Subject  points
0 Team 1 Math 10
1 Team 1 Science 8
2 Team 2 Science 10
3 Team 3 Math 6
4 Team 2 Science 6
5 Team 3 Math 5


Drop Rows that Contain a Specific String in Pandas

Below are the ways by which we can drop rows that contains a specific string in Pandas:

  • Dropping the rows that contain a specific string
  • Dropping the rows with more than one string
  • Drop rows with the given partial string

Dropping the Rows that Contain a Specific String

In this method, we are going to find the rows with str.contains() function which will basically take the string from the series and check for the match of the given string, and using a boolean we are selecting the rows and setting them to False will help us to neglect the selected rows and keep the remaining rows.

Syntax:  df[df[“column_name”].str.contains(“string”)==False]

In the following example, we are going to select all the teams except  “Team 1”.

Python3




# importing the library
import pandas as pd
 
# Dataframe
df = pd.DataFrame({'team': ['Team 1', 'Team 1', 'Team 2',
                            'Team 3', 'Team 2', 'Team 3'],
                   'Subject': ['Math', 'Science', 'Science',
                               'Math', 'Science', 'Math'],
                   'points': [10, 8, 10, 6, 6, 5]})
 
# Dropping the team 1
df = df[df["team"].str.contains("Team 1") == False]
 
df


Output:

     team Subject  points
2 Team 2 Science 10
4 Team 2 Science 6
5 Team 3 Math 5


Dropping the Rows with More Than One String

Same as method 1, we follow the same steps here but with a bitwise or operator to add an extra string to search for.

Syntax: df = df[df[“column_name”].str.contains(“string1|string2”)==False]

In the following, program we are going to drop the rows that contain “Team 1” or “Team 2”.

Python3




# importing the library
import pandas as pd
 
# Dataframe
df = pd.DataFrame({'team': ['Team 1', 'Team 1', 'Team 2',
                            'Team 3', 'Team 2', 'Team 3'],
                   'Subject': ['Math', 'Science', 'Science',
                               'Math', 'Science', 'Math'],
                   'points': [10, 8, 10, 6, 6, 5]})
 
# Dropping the rows of team 1 and team 2
df = df[df["team"].str.contains("Team 1|Team 2") == False]
 
# display
df


Output:

     team Subject  points
3 Team 3 Math 6
5 Team 3 Math 5

Drop Rows With the Given Partial String

Here we are using the same function with a join method that carries the part of the word we need to search. 

Syntax: df[ ~df.column_name.str.contains(‘|’.join([“string”])) ]

In this following program, the situation is different from the above two cases. Here we are going to select and drop the rows with the given partial string. For example, we are going to drop the rows with “Sci” on the column subjects.

Python3




# importing the library
import pandas as pd
 
# Dataframe
df = pd.DataFrame({'team': ['Team 1', 'Team 1', 'Team 2',
                            'Team 3', 'Team 2', 'Team 3'],
                   'Subject': ['Math', 'Science', 'Science',
                               'Math', 'Science', 'Math'],
                   'points': [10, 8, 10, 6, 6, 5]})
 
# Dropping the rows with "Sci"
# identify partial string
discard = ["Sci"]
 
# drop rows that contain the partial string "Sci"
df[~df.Subject.str.contains('|'.join(discard))]
 
# display
df


Output:

     team Subject  points
0 Team 1 Math 10
3 Team 3 Math 6
5 Team 3 Math 5



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads