Open In App
Related Articles

Python | Creating a Pandas dataframe column based on a given condition

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Report issue
Report

While operating on data, there could be instances where we would like to add a column based on some condition. There does not exist any library function to achieve this task directly, so we are going to see how we can achieve this goal. In this article, we will see how to create a Pandas dataframe column based on a given condition in Python.

Problem: Given a Dataframe containing the data of a cultural event, add a column called ‘Price’ which contains the ticket price for a particular day based on the type of event that will be conducted on that particular day.

Creating a Pandas Dataframe Column Based on a condition

There are various methods to Create a Pandas Dataframe Column Based on a Given Condition here we are explaining some generally used methods for Creating a Pandas dataframe column based on a given condition.

Creating a DataFrame

Here we are creating the data frame to solve the given problem.

Python3

# importing pandas as pd
import pandas as pd
 
# Creating the dataframe
df = pd.DataFrame({'Date': ['11/8/2011', '11/9/2011', '11/10/2011',
                            '11/11/2011', '11/12/2011'],
                   'Event': ['Music', 'Poetry', 'Music', 'Comedy', 'Poetry']})
 
# Print the dataframe
print(df)

                    

Output

         Date   Event
0   11/8/2011   Music
1   11/9/2011  Poetry
2  11/10/2011   Music
3  11/11/2011  Comedy
4  11/12/2011  Poetry

Using List Comprehension

We can use Python’s list comprehension technique to achieve this task. List comprehension is mostly faster than other methods. 
Now we will add a new column called ‘Price’ to the dataframe. For that purpose, we will use list comprehension technique. Set the price to 1500 if the ‘Event’ is ‘Music’ else 800. 

Python3

# Add a new column named 'Price'
df['Price'] = [1500 if x == 'Music' else 800 for x in df['Event']]
 
# Print the DataFrame
print(df)

                    

Output :

         Date   Event  Price
0   11/8/2011   Music   1500
1   11/9/2011  Poetry    800
2  11/10/2011   Music   1500
3  11/11/2011  Comedy    800
4  11/12/2011  Poetry    800

As we can see in the output, we have successfully added a new column to the dataframe based on some condition.  

Using DataFrame.apply() Function

We can use DataFrame.apply() function to achieve the goal. There could be instances when we have more than two values, in that case, we can use a dictionary to map new values onto the keys. This does provide a lot of flexibility when we are having a larger number of categories for which we want to assign different values to the newly added column. 

Now we will add a new column called ‘Price’ to the dataframe. For that purpose we will use DataFrame.apply() function to achieve the goal. Set the price to 1500 if the ‘Event’ is ‘Music’, 1200 if the ‘Event’ is ‘Comedy’ and 800 if the ‘Event’ is ‘Poetry’. 

Python3

# Define a function to map the values
def set_value(row_number, assigned_value):
    return assigned_value[row_number]
 
 
# Create the dictionary
event_dictionary = {'Music': 1500, 'Poetry': 800, 'Comedy': 1200}
 
# Add a new column named 'Price'
df['Price'] = df['Event'].apply(set_value, args=(event_dictionary, ))
 
# Print the DataFrame
print(df)

                    

Output :

         Date   Event  Price
0   11/8/2011   Music   1500
1   11/9/2011  Poetry    800
2  11/10/2011   Music   1500
3  11/11/2011  Comedy   1200
4  11/12/2011  Poetry    800

 As we can see in the output, we have successfully added a new column to the dataframe based on some condition.   

Using DataFrame.map() Function

We can use DataFrame.map() function to achieve the goal. It is a very straight forward method where we use a dictionary to simply map values to the newly added column based on the key. 

Now we will add a new column called ‘Price’ to the dataframe. For that purpose we will use DataFrame.map() function to achieve the goal. Set the price to 1500 if the ‘Event’ is ‘Music’, 1200 if the ‘Event’ is ‘Comedy’ and 800 if the ‘Event’ is ‘Poetry’. 

Python3

# Create the dictionary
event_dictionary ={'Music' : 1500, 'Poetry' : 800, 'Comedy' : 1200}
 
# Add a new column named 'Price'
df['Price'] = df['Event'].map(event_dictionary)
 
# Print the DataFrame
print(df)

                    

Output :

         Date   Event  Price
0   11/8/2011   Music   1500
1   11/9/2011  Poetry    800
2  11/10/2011   Music   1500
3  11/11/2011  Comedy   1200
4  11/12/2011  Poetry    800

Using numpy.where() Function

We can use numpy.where() function to achieve the goal. It is a very straight forward method where we use a where condition to simply map values to the newly added column based on the condition. 

Now we will add a new column called ‘Price’ to the dataframe. Set the price to 1500 if the ‘Event’ is ‘Music’, 1500 and rest all the events to 800.

Python3

# np.where(condition, value if condition
# is true, value if condition is false)
 
df['Price'] = np.where(df['Event']
                       =='Music', 1500,800 )
 
print(df)

                    

Output

         Date   Event  Price
0   11/8/2011   Music   1500
1   11/9/2011  Poetry    800
2  11/10/2011   Music   1500
3  11/11/2011  Comedy    800
4  11/12/2011  Poetry    800

Using Assign() function

We use assign() function in Pandas to assign new columns to a DataFrame. It returns a new DataFrame with the new columns added. You can use this method to create a new column based on a given condition.

Let’s use the above DataFrame and modify the code to create a new column ‘Category’ based on the ‘Event’ column. In this example, we’ll create a ‘Category’ column with values ‘Entertainment’ for ‘Music’ and ‘Comedy’ events, and ‘Literature’ for ‘Poetry’ events:

Python3

# Define a condition to create a new column
# Let's say 'Entertainment' for 'Music' and 'Comedy', and 'Literature' for 'Poetry'
condition = df['Event'].isin(['Music', 'Comedy'])
df = df.assign(Category=lambda x: 'Entertainment' if condition.any() else 'Literature')
 
# Display the updated DataFrame
print(df)

                    

Output :

         Date   Event        Category
0   11/8/2011   Music  Entertainment
1   11/9/2011  Poetry      Literature
2  11/10/2011   Music  Entertainment
3  11/11/2011  Comedy  Entertainment
4  11/12/2011  Poetry      Literature

Using DataFrame.loc[] function

We use DataFrame.loc[] function in pandas to access a group of rows and columns by labels or a boolean array. It’s a powerful tool for conditional indexing and updating values in a DataFrame. Let’s use it to create a new column based on a given condition.

Let’s use the above DataFrame and modify the code in below code, we’ll create a new column called ‘Genre’ based on the ‘Event’ column. If the ‘Event’ is ‘Music’, the ‘Genre’ will be set to ‘Rock’; if it’s ‘Poetry’, the ‘Genre’ will be set to ‘Literary’; otherwise, it will be set to ‘Other’.

Python3

# Creating a new column 'Genre' based on a condition
df['Genre'] = 'Other'  # Initialize the 'Genre' column with 'Other' as the default value
 
# Using DataFrame.loc[] to set values based on the condition
df.loc[df['Event'] == 'Music', 'Genre'] = 'Rock'
df.loc[df['Event'] == 'Poetry', 'Genre'] = 'Literary'
 
# Displaying the modified dataframe
print(df)

                    

Output :

         Date   Event      Genre
0   11/8/2011   Music       Rock
1   11/9/2011  Poetry  Literary
2  11/10/2011   Music       Rock
3  11/11/2011  Comedy      Other
4  11/12/2011  Poetry  Literary

Using lambda function

A lambda function in Python is a concise, anonymous function created with the lambda keyword, typically used for short operations.

In this example, a new column ‘Priority’ is created based on the condition: if the ‘Event’ is ‘Music’, the priority is set to ‘High’, otherwise, it’s set to ‘Low’. The apply() function along with a lambda function is used to achieve this

Python3

df = pd.DataFrame({'Date': ['11/8/2011', '11/9/2011', '11/10/2011', '11/11/2011', '11/12/2011'],
                   'Event': ['Music', 'Poetry', 'Music', 'Comedy', 'Poetry']})
 
# Adding a new column 'Priority' based on a condition using lambda function
df['Priority'] = df['Event'].apply(lambda x: 'High' if x == 'Music' else 'Low')
 
# Print the modified dataframe
print(df)

                    

Output :

   Date   Event Priority
0 11/8/2011 Music High
1 11/9/2011 Poetry Low
2 11/10/2011 Music High
3 11/11/2011 Comedy Low
4 11/12/2011 Poetry Low


Last Updated : 03 Dec, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads