Open In App

Check For A Substring In A Pandas Dataframe Column

Pandas is a data analysis library for Python that has exploded in popularity over the past years. In technical terms, pandas is an in memory nosql database, that has sql-like constructs, basic statistical and analytic support, as well as graphing capability .One common task in data analysis is searching for substrings within a dataset, and Pandas offers efficient tools to accomplish this.

In this article, we will explore the ways by which we can check for a substring in a Pandas DataFrame column.



Check for a Substring in a DataFrame Column

Below are some of the ways by which check for a substring in a Pandas DataFrame column in Python:

Check For a Substring in a Pandas Dataframe using str.contains() method

In this example, a pandas DataFrame is created with employee information. A new column, ‘NameContainsSubstring,’ is added, indicating whether the substring ‘an’ is present in each ‘Name’ entry using the str.contains method.






import pandas as pd
 
data = {
    'EmployeeID': [101, 102, 103, 104],
    'Name': ['Aman', 'Bhavna', 'Madhav', 'Rohan'],
    'Department': ['HR', 'IT', 'Finance', 'Marketing'],
    'Salary': [60000, 75000, 90000, 65000]
}
 
df = pd.DataFrame(data)
 
# Checking for substring 'an' in the 'Name' column
substring = 'an'
df['NameContainsSubstring'] = df['Name'].str.contains(substring)
filtered_df = df[df['NameContainsSubstring']]
print(filtered_df)

Output:

   EmployeeID   Name Department  Salary  NameContainsSubstring
0 101 Aman HR 60000 True
3 104 Rohan Marketing 65000 True

Check For A Substring In A Pandas Dataframe Using Regular Expressions

In this example, a pandas DataFrame is created with employee information. A new column, ‘NameContainsPattern,’ is added, indicating whether the regular expression pattern ‘ma’ is present in each ‘Name’ entry.

In this example, the str.contains method is used with the regex=True parameter to interpret the pattern as a regular expression. The negative lookahead ensures that ‘ma’ is not immediately followed by the end of the string.




import pandas as pd
data = {
    'EmployeeID': [101, 102, 103, 104],
    'Name': ['aman', 'bhavna', 'madhav', 'rohan'],
    'Department': ['HR', 'IT', 'Finance', 'Marketing'],
    'Salary': [60000, 75000, 90000, 65000]
}
 
df = pd.DataFrame(data)
 
# regular expression pattern with negative lookahead
pattern = r'ma(?!$)'
df['NameContainsPattern'] = df['Name'].str.contains(pattern, regex=True)
filtered_df = df[df['NameContainsPattern']]
print(filtered_df)

Output:

   EmployeeID    Name Department  Salary  NameContainsPattern
0 101 aman HR 60000 True
2 103 madhav Finance 90000 True

Check For A Substring In A Pandas Dataframe Using apply() function

In this example, a pandas DataFrame is created with employee information, including ‘EmployeeID’, ‘Name’, ‘Department’, and ‘Salary’. A new column, ‘NameContainsSubstring,’ is added, indicating whether the substring ‘av’ is present in each ‘Name’ entry using the apply() method with a lambda function.




import pandas as pd
 
# Creating a relevant 4-column DataFrame
data = {
    'EmployeeID': [101, 102, 103, 104],
    'Name': ['Aman', 'Bhavna', 'Madhav', 'Rohan'],
    'Department': ['HR', 'IT', 'Finance', 'Marketing'],
    'Salary': [60000, 75000, 90000, 65000]
}
 
df = pd.DataFrame(data)
 
# Checking for substring 'av' in the 'Name' column and adding a new column
substring = 'av'
df['NameContainsSubstring'] = df['Name'].apply(lambda x: substring in x)
filtered_df = df[df['NameContainsSubstring']]
print(filtered_df)

Output:

   EmployeeID    Name Department  Salary  NameContainsSubstring
1 102 Bhavna IT 75000 True
2 103 Madhav Finance 90000 True

Check For A Substring In A Pandas Dataframe Using List Comprehension with ‘in’ Operator

In this example, let’s check whether the substring is present in each department key using list comprehension.




import pandas as pd
data = {
    'EmployeeID': [101, 102, 103, 104],
    'Name': ['Aman', 'Bhavna', 'Madhav', 'Rohan'],
    'Department': ['HR', 'IT', 'Finance', 'Marketing'],
    'Salary': [60000, 75000, 90000, 65000]
}
 
df = pd.DataFrame(data)
 
# Checking for substring
substring = 'Finance'
df['NameContainsSubstring'] = [substring in Department for Department in df['Department']]
filtered_df = df[df['NameContainsSubstring']]
print(filtered_df)

Output:

   EmployeeID    Name Department  Salary  NameContainsSubstring
2 103 Madhav Finance 90000 True


Article Tags :