Check For A Substring In A Pandas Dataframe Column

Pandas is a data analysis library for Python that has exploded in popularity over the past years. In technical terms, pandas is an in memory nosql database, that has sql-like constructs, basic statistical and analytic support, as well as graphing capability .One common task in data analysis is searching for substrings within a dataset, and Pandas offers efficient tools to accomplish this.

In this article, we will explore the ways by which we can check for a substring in a Pandas DataFrame column.

Check for a Substring in a DataFrame Column

Below are some of the ways by which check for a substring in a Pandas DataFrame column in Python:

Using str.contains() method
Using Regular Expressions
apply() function
List Comprehension with ‘in’ Operator

Check For a Substring in a Pandas Dataframe using str.contains() method

In this example, a pandas DataFrame is created with employee information. A new column, ‘NameContainsSubstring,’ is added, indicating whether the substring ‘an’ is present in each ‘Name’ entry using the str.contains method.

Python3

import pandas as pd
 
data = {

    'EmployeeID': [101, 102, 103, 104],

    'Name': ['Aman', 'Bhavna', 'Madhav', 'Rohan'],

    'Department': ['HR', 'IT', 'Finance', 'Marketing'],

    'Salary': [60000, 75000, 90000, 65000]
}
 
df = pd.DataFrame(data)
 
# Checking for substring 'an' in the 'Name' column

substring = 'an'

df['NameContainsSubstring'] = df['Name'].str.contains(substring)

filtered_df = df[df['NameContainsSubstring']]

print(filtered_df)

Output:

   EmployeeID   Name Department  Salary  NameContainsSubstring
0         101   Aman         HR   60000                   True
3         104  Rohan  Marketing   65000                   True

Check For A Substring In A Pandas Dataframe Using Regular Expressions

In this example, a pandas DataFrame is created with employee information. A new column, ‘NameContainsPattern,’ is added, indicating whether the regular expression pattern ‘ma’ is present in each ‘Name’ entry.

In this example, the str.contains method is used with the regex=True parameter to interpret the pattern as a regular expression. The negative lookahead ensures that ‘ma’ is not immediately followed by the end of the string.

Python3

import pandas as pd

data = {

    'EmployeeID': [101, 102, 103, 104],

    'Name': ['aman', 'bhavna', 'madhav', 'rohan'],

    'Department': ['HR', 'IT', 'Finance', 'Marketing'],

    'Salary': [60000, 75000, 90000, 65000]
}
 
df = pd.DataFrame(data)
 
# regular expression pattern with negative lookahead

pattern = r'ma(?!$)'

df['NameContainsPattern'] = df['Name'].str.contains(pattern, regex=True)

filtered_df = df[df['NameContainsPattern']]

print(filtered_df)

Output:

   EmployeeID    Name Department  Salary  NameContainsPattern
0         101    aman         HR   60000                 True
2         103  madhav    Finance   90000                 True

Check For A Substring In A Pandas Dataframe Using apply() function

In this example, a pandas DataFrame is created with employee information, including ‘EmployeeID’, ‘Name’, ‘Department’, and ‘Salary’. A new column, ‘NameContainsSubstring,’ is added, indicating whether the substring ‘av’ is present in each ‘Name’ entry using the apply() method with a lambda function.

Python3

import pandas as pd
 
# Creating a relevant 4-column DataFrame

data = {

    'EmployeeID': [101, 102, 103, 104],

    'Name': ['Aman', 'Bhavna', 'Madhav', 'Rohan'],

    'Department': ['HR', 'IT', 'Finance', 'Marketing'],

    'Salary': [60000, 75000, 90000, 65000]
}
 
df = pd.DataFrame(data)
 
# Checking for substring 'av' in the 'Name' column and adding a new column

substring = 'av'

df['NameContainsSubstring'] = df['Name'].apply(lambda x: substring in x)

filtered_df = df[df['NameContainsSubstring']]

print(filtered_df)

Output:

   EmployeeID    Name Department  Salary  NameContainsSubstring
1         102  Bhavna         IT   75000                   True
2         103  Madhav    Finance   90000                   True

Check For A Substring In A Pandas Dataframe Using List Comprehension with ‘in’ Operator

In this example, let’s check whether the substring is present in each department key using list comprehension.

Python3

import pandas as pd

data = {

    'EmployeeID': [101, 102, 103, 104],

    'Name': ['Aman', 'Bhavna', 'Madhav', 'Rohan'],

    'Department': ['HR', 'IT', 'Finance', 'Marketing'],

    'Salary': [60000, 75000, 90000, 65000]
}
 
df = pd.DataFrame(data)
 
# Checking for substring

substring = 'Finance'

df['NameContainsSubstring'] = [substring in Department for Department in df['Department']]

filtered_df = df[df['NameContainsSubstring']]

print(filtered_df)

Output:

   EmployeeID    Name Department  Salary  NameContainsSubstring
2         103  Madhav    Finance   90000                   True

Article Tags :

AI-ML-DS

Data Science

Python-pandas