Python | Creating a Pandas dataframe column based on a given condition

Last Updated : 03 Dec, 2023

While operating on data, there could be instances where we would like to add a column based on some condition. There does not exist any library function to achieve this task directly, so we are going to see how we can achieve this goal. In this article, we will see how to create a Pandas dataframe column based on a given condition in Python.

Problem: Given a Dataframe containing the data of a cultural event, add a column called ‘Price’ which contains the ticket price for a particular day based on the type of event that will be conducted on that particular day.

Creating a Pandas Dataframe Column Based on a condition

There are various methods to Create a Pandas Dataframe Column Based on a Given Condition here we are explaining some generally used methods for Creating a Pandas dataframe column based on a given condition.

Using List Comprehension
Using DataFrame.apply() Function
Using DataFrame.map() Function
Using numpy.where() Function
Using DataFrame.loc[] function
Using Lambda function

Creating a DataFrame

Here we are creating the data frame to solve the given problem.

Python3

# importing pandas as pd
import pandas as pd
 
# Creating the dataframe
df = pd.DataFrame({'Date': ['11/8/2011', '11/9/2011', '11/10/2011',
                            '11/11/2011', '11/12/2011'],
                   'Event': ['Music', 'Poetry', 'Music', 'Comedy', 'Poetry']})
 
# Print the dataframe
print(df)

Output

         Date   Event
0   11/8/2011   Music
1   11/9/2011  Poetry
2  11/10/2011   Music
3  11/11/2011  Comedy
4  11/12/2011  Poetry

Using List Comprehension

We can use Python’s list comprehension technique to achieve this task. List comprehension is mostly faster than other methods.
Now we will add a new column called ‘Price’ to the dataframe. For that purpose, we will use list comprehension technique. Set the price to 1500 if the ‘Event’ is ‘Music’ else 800.

Python3

# Add a new column named 'Price'
df['Price'] = [1500 if x == 'Music' else 800 for x in df['Event']]
 
# Print the DataFrame
print(df)

Output :

         Date   Event  Price
0   11/8/2011   Music   1500
1   11/9/2011  Poetry    800
2  11/10/2011   Music   1500
3  11/11/2011  Comedy    800
4  11/12/2011  Poetry    800

As we can see in the output, we have successfully added a new column to the dataframe based on some condition.

Using DataFrame.apply() Function

We can use DataFrame.apply() function to achieve the goal. There could be instances when we have more than two values, in that case, we can use a dictionary to map new values onto the keys. This does provide a lot of flexibility when we are having a larger number of categories for which we want to assign different values to the newly added column.

Now we will add a new column called ‘Price’ to the dataframe. For that purpose we will use DataFrame.apply() function to achieve the goal. Set the price to 1500 if the ‘Event’ is ‘Music’, 1200 if the ‘Event’ is ‘Comedy’ and 800 if the ‘Event’ is ‘Poetry’.

Python3

# Define a function to map the values
def set_value(row_number, assigned_value):
    return assigned_value[row_number]
 
 
# Create the dictionary
event_dictionary = {'Music': 1500, 'Poetry': 800, 'Comedy': 1200}
 
# Add a new column named 'Price'
df['Price'] = df['Event'].apply(set_value, args=(event_dictionary, ))
 
# Print the DataFrame
print(df)

Output :

         Date   Event  Price
0   11/8/2011   Music   1500
1   11/9/2011  Poetry    800
2  11/10/2011   Music   1500
3  11/11/2011  Comedy   1200
4  11/12/2011  Poetry    800

As we can see in the output, we have successfully added a new column to the dataframe based on some condition.

Using DataFrame.map() Function

We can use DataFrame.map() function to achieve the goal. It is a very straight forward method where we use a dictionary to simply map values to the newly added column based on the key.

Now we will add a new column called ‘Price’ to the dataframe. For that purpose we will use DataFrame.map() function to achieve the goal. Set the price to 1500 if the ‘Event’ is ‘Music’, 1200 if the ‘Event’ is ‘Comedy’ and 800 if the ‘Event’ is ‘Poetry’.

Python3

# Create the dictionary
event_dictionary ={'Music' : 1500, 'Poetry' : 800, 'Comedy' : 1200}
 
# Add a new column named 'Price'
df['Price'] = df['Event'].map(event_dictionary)
 
# Print the DataFrame
print(df)

Output :

         Date   Event  Price
0   11/8/2011   Music   1500
1   11/9/2011  Poetry    800
2  11/10/2011   Music   1500
3  11/11/2011  Comedy   1200
4  11/12/2011  Poetry    800

Using numpy.where() Function

We can use numpy.where() function to achieve the goal. It is a very straight forward method where we use a where condition to simply map values to the newly added column based on the condition.

Now we will add a new column called ‘Price’ to the dataframe. Set the price to 1500 if the ‘Event’ is ‘Music’, 1500 and rest all the events to 800.

Python3

# np.where(condition, value if condition 
# is true, value if condition is false)
 
df['Price'] = np.where(df['Event'] 
                       =='Music', 1500,800 )
 
print(df)

Output

         Date   Event  Price
0   11/8/2011   Music   1500
1   11/9/2011  Poetry    800
2  11/10/2011   Music   1500
3  11/11/2011  Comedy    800
4  11/12/2011  Poetry    800

Using Assign() function

We use assign() function in Pandas to assign new columns to a DataFrame. It returns a new DataFrame with the new columns added. You can use this method to create a new column based on a given condition.

Let’s use the above DataFrame and modify the code to create a new column ‘Category’ based on the ‘Event’ column. In this example, we’ll create a ‘Category’ column with values ‘Entertainment’ for ‘Music’ and ‘Comedy’ events, and ‘Literature’ for ‘Poetry’ events:

Python3

# Define a condition to create a new column
# Let's say 'Entertainment' for 'Music' and 'Comedy', and 'Literature' for 'Poetry'
condition = df['Event'].isin(['Music', 'Comedy'])
df = df.assign(Category=lambda x: 'Entertainment' if condition.any() else 'Literature')
 
# Display the updated DataFrame
print(df)

Output :

         Date   Event        Category
0   11/8/2011   Music  Entertainment
1   11/9/2011  Poetry      Literature
2  11/10/2011   Music  Entertainment
3  11/11/2011  Comedy  Entertainment
4  11/12/2011  Poetry      Literature

Using DataFrame.loc[] function

We use DataFrame.loc[] function in pandas to access a group of rows and columns by labels or a boolean array. It’s a powerful tool for conditional indexing and updating values in a DataFrame. Let’s use it to create a new column based on a given condition.

Let’s use the above DataFrame and modify the code in below code, we’ll create a new column called ‘Genre’ based on the ‘Event’ column. If the ‘Event’ is ‘Music’, the ‘Genre’ will be set to ‘Rock’; if it’s ‘Poetry’, the ‘Genre’ will be set to ‘Literary’; otherwise, it will be set to ‘Other’.

Python3

# Creating a new column 'Genre' based on a condition
df['Genre'] = 'Other'  # Initialize the 'Genre' column with 'Other' as the default value
 
# Using DataFrame.loc[] to set values based on the condition
df.loc[df['Event'] == 'Music', 'Genre'] = 'Rock'
df.loc[df['Event'] == 'Poetry', 'Genre'] = 'Literary'
 
# Displaying the modified dataframe
print(df)

Output :

         Date   Event      Genre
0   11/8/2011   Music       Rock
1   11/9/2011  Poetry  Literary
2  11/10/2011   Music       Rock
3  11/11/2011  Comedy      Other
4  11/12/2011  Poetry  Literary

Using lambda function

A lambda function in Python is a concise, anonymous function created with the lambda keyword, typically used for short operations.

In this example, a new column ‘Priority’ is created based on the condition: if the ‘Event’ is ‘Music’, the priority is set to ‘High’, otherwise, it’s set to ‘Low’. The apply() function along with a lambda function is used to achieve this

Python3

df = pd.DataFrame({'Date': ['11/8/2011', '11/9/2011', '11/10/2011', '11/11/2011', '11/12/2011'],
                   'Event': ['Music', 'Poetry', 'Music', 'Comedy', 'Poetry']})
 
# Adding a new column 'Priority' based on a condition using lambda function
df['Priority'] = df['Event'].apply(lambda x: 'High' if x == 'Music' else 'Low')
 
# Print the modified dataframe
print(df)

Output :

   Date   Event Priority
0   11/8/2011   Music     High
1   11/9/2011  Poetry      Low
2  11/10/2011   Music     High
3  11/11/2011  Comedy      Low
4  11/12/2011  Poetry      Low

Suggest improvement

Create a new column in Pandas DataFrame based on the existing columns

Split a column in Pandas dataframe and get part of it

Share your thoughts in the comments

Pandas DataFrame Practice Exercises

Pandas Dataframe Rows Practice Exercise

Pandas Dataframe Columns Practice Exercise

Pandas Series Practice Exercise

Pandas Date and Time Practice Exercise

DataFrame String Manipulation

Accessing and Manipulating Data in DataFrame

DataFrame Visualization and Exporting

Data Aggregation and Grouping

Merging and Joining

Filtering and Selecting Data

Select Rows With Multiple Filters in Pandas

Selection and Slicing

Miscellaneous DataFrame Operations

Data Cleaning and Manipulation

Concatenation and Manipulation

DataFrame Sorting and Reordering

DataFrame Transformation and Conversion

DataFrame Filtering and Selection

DataFrame Conversion and Reshaping

Python | Creating a Pandas dataframe column based on a given condition

Creating a Pandas Dataframe Column Based on a condition

Creating a DataFrame

Python3

Using List Comprehension

Python3

Using DataFrame.apply() Function

Python3

Using DataFrame.map() Function

Python3

Using numpy.where() Function

Python3

Using Assign() function

Python3

Using DataFrame.loc[] function

Python3

Using lambda function

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?