Open In App

Check For A Substring In A Pandas Dataframe Column

Last Updated : 12 Feb, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Pandas is a data analysis library for Python that has exploded in popularity over the past years. In technical terms, pandas is an in memory nosql database, that has sql-like constructs, basic statistical and analytic support, as well as graphing capability .One common task in data analysis is searching for substrings within a dataset, and Pandas offers efficient tools to accomplish this.

In this article, we will explore the ways by which we can check for a substring in a Pandas DataFrame column.

Check for a Substring in a DataFrame Column

Below are some of the ways by which check for a substring in a Pandas DataFrame column in Python:

  • Using str.contains() method
  • Using Regular Expressions
  • apply() function
  • List Comprehension with ‘in’ Operator

Check For a Substring in a Pandas Dataframe using str.contains() method

In this example, a pandas DataFrame is created with employee information. A new column, ‘NameContainsSubstring,’ is added, indicating whether the substring ‘an’ is present in each ‘Name’ entry using the str.contains method.

Python3




import pandas as pd
 
data = {
    'EmployeeID': [101, 102, 103, 104],
    'Name': ['Aman', 'Bhavna', 'Madhav', 'Rohan'],
    'Department': ['HR', 'IT', 'Finance', 'Marketing'],
    'Salary': [60000, 75000, 90000, 65000]
}
 
df = pd.DataFrame(data)
 
# Checking for substring 'an' in the 'Name' column
substring = 'an'
df['NameContainsSubstring'] = df['Name'].str.contains(substring)
filtered_df = df[df['NameContainsSubstring']]
print(filtered_df)


Output:

   EmployeeID   Name Department  Salary  NameContainsSubstring
0 101 Aman HR 60000 True
3 104 Rohan Marketing 65000 True

Check For A Substring In A Pandas Dataframe Using Regular Expressions

In this example, a pandas DataFrame is created with employee information. A new column, ‘NameContainsPattern,’ is added, indicating whether the regular expression pattern ‘ma’ is present in each ‘Name’ entry.

In this example, the str.contains method is used with the regex=True parameter to interpret the pattern as a regular expression. The negative lookahead ensures that ‘ma’ is not immediately followed by the end of the string.

Python3




import pandas as pd
data = {
    'EmployeeID': [101, 102, 103, 104],
    'Name': ['aman', 'bhavna', 'madhav', 'rohan'],
    'Department': ['HR', 'IT', 'Finance', 'Marketing'],
    'Salary': [60000, 75000, 90000, 65000]
}
 
df = pd.DataFrame(data)
 
# regular expression pattern with negative lookahead
pattern = r'ma(?!$)'
df['NameContainsPattern'] = df['Name'].str.contains(pattern, regex=True)
filtered_df = df[df['NameContainsPattern']]
print(filtered_df)


Output:

   EmployeeID    Name Department  Salary  NameContainsPattern
0 101 aman HR 60000 True
2 103 madhav Finance 90000 True

Check For A Substring In A Pandas Dataframe Using apply() function

In this example, a pandas DataFrame is created with employee information, including ‘EmployeeID’, ‘Name’, ‘Department’, and ‘Salary’. A new column, ‘NameContainsSubstring,’ is added, indicating whether the substring ‘av’ is present in each ‘Name’ entry using the apply() method with a lambda function.

Python3




import pandas as pd
 
# Creating a relevant 4-column DataFrame
data = {
    'EmployeeID': [101, 102, 103, 104],
    'Name': ['Aman', 'Bhavna', 'Madhav', 'Rohan'],
    'Department': ['HR', 'IT', 'Finance', 'Marketing'],
    'Salary': [60000, 75000, 90000, 65000]
}
 
df = pd.DataFrame(data)
 
# Checking for substring 'av' in the 'Name' column and adding a new column
substring = 'av'
df['NameContainsSubstring'] = df['Name'].apply(lambda x: substring in x)
filtered_df = df[df['NameContainsSubstring']]
print(filtered_df)


Output:

   EmployeeID    Name Department  Salary  NameContainsSubstring
1 102 Bhavna IT 75000 True
2 103 Madhav Finance 90000 True

Check For A Substring In A Pandas Dataframe Using List Comprehension with ‘in’ Operator

In this example, let’s check whether the substring is present in each department key using list comprehension.

Python3




import pandas as pd
data = {
    'EmployeeID': [101, 102, 103, 104],
    'Name': ['Aman', 'Bhavna', 'Madhav', 'Rohan'],
    'Department': ['HR', 'IT', 'Finance', 'Marketing'],
    'Salary': [60000, 75000, 90000, 65000]
}
 
df = pd.DataFrame(data)
 
# Checking for substring
substring = 'Finance'
df['NameContainsSubstring'] = [substring in Department for Department in df['Department']]
filtered_df = df[df['NameContainsSubstring']]
print(filtered_df)


Output:

   EmployeeID    Name Department  Salary  NameContainsSubstring
2 103 Madhav Finance 90000 True



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads