Open In App

How to do Conditional Mutate in R

Last Updated : 29 Jan, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

In R Programming Language, Mutate() is a function used to create, delete, and modify columns in a dataset. It is used to create columns that are functions of existing variables.

R Mutate() function syntax:

mutate(x, expr)

Parameters:

X: Data Frame

expr: operation on variables

Here we are creating a simple dataset and performing a simple mutate operation to understand how mutate() works. We created a dataset with values and used mutate() to add a new column where the values are squared.

R




# Load necessary package
library(dplyr)
 
# Create a sample dataset
data <- data.frame(
  id = 1:5,
  value = c(10, 15, 20, 25, 30)
)
 
# Use mutate to create a new variable
data_mutated <- data %>%
  mutate(value_squared = value^2)
 
# Print the original and mutated datasets
print("Original Dataset:")
print(data)
 
print("\nMutated Dataset:")
print(data_mutated)


Output:

[1] Original Dataset
1  1    10
2  2    15
3  3    20
4  4    25
5  5    30
[1] Mutated Dataset
1  1    10           100
2  2    15           225
3  3    20           400
4  4    25           625
5  5    30           900

Before learning about Conditional Mutate in R we should know about relational operators present in R.

Operator

is TRUE if

A < B

A is Less than B

A <= B

A is Less than equal to B

A > B

A is Greater than B

A >= B

A is Greater than equal to B

A == B

A is Equal to B

A != B

A is Not Equal to B

A %in% B

A is an element of B

Conditional Mutate in R

In R, mutate() function we can create and modify the columns of the datasets by applying conditions on the columns of the dataset. We can do Conditional Mutate in R in two types

Two types of Conditional Mutate in R:

  • case_when()
  • ifelse()

case_when() function in mutate()

case_when() is a function used in mutate() to create and modify the columns of a dataset using conditions. We use these conditions to categorize or eliminate value etc, It has a simple syntax

syntax:

case_when( X ~ Y)

parameters:

X: Condition to be applied

~: tilde

Y: Value to be set

Here x is the condition we will be applying to the dataset ‘~’ is the tilde and right of this is Y which is the value to be inserted in the column.

lets learn about case_when() in detail with some examples

Install Necessary Libraries

  • tibble: tibble package is used to create and manipulate data frames.
  • dplyr: dplyr package includes the mutate() function which we are using in the next sections.

R




# Install and load required packages
if (!requireNamespace("tibble", quietly = TRUE)) {
  install.packages("tibble")
}
if (!requireNamespace("dplyr", quietly = TRUE)) {
  install.packages("dplyr")
}
 
# Load the installed packages
library(tibble)
library(dplyr)


Create a simple Dataset

Here, We are creating a simple dataset to perform operations on Conditional Mutate in R. This dataset includes the ID, Name, Age, Gender, and Education of 10 members male and female and we have some NA values in the dataset. We created those missing values to understand how we handle those missing values with mutate().

R




# Create a tibble with 5 columns and 10 rows
my_data <- tibble(
  ID = 1:10,
  Name = c("Alice", "Bob", "Charlie", "David", "Eva", "Frank", "Grace", "Hank",
           "Ivy", "Jack"),
  Age = c(25, 18, 22, NA, 35, 16, 24, NA, 27, 33),
  Gender = c("Female", "Male", "Male", "Male", "Female", "Male", "Female", "Male",
             "Female", "Male"),
  Education = c("Bachelor's", "High School", "Bachelor's", "PhD", "Master's",
                "High School", "PhD", "Master's", "Bachelor's", "PhD")
)
 
# Print the tibble
print(my_data)


Output:

A tibble: 10 × 5
      ID Name      Age Gender Education  
   <int> <chr>   <dbl> <chr>  <chr>      
 1     1 Alice      25 Female Bachelor's 
 2     2 Bob        18 Male   High School
 3     3 Charlie    22 Male   Bachelor's 
 4     4 David      NA Male   PhD        
 5     5 Eva        35 Female Master's   
 6     6 Frank      16 Male   High School
 7     7 Grace      24 Female PhD        
 8     8 Hank       NA Male   Master's   
 9     9 Ivy        27 Female Bachelor's 
10    10 Jack       33 Male   PhD 

Select a column and mutate using case_when()

We are selecting the age column from the dataset using the select() function and saving it in another variable age_data for better understanding and this will not affect the whole dataset. We created a new column ‘Age_Group’ using mutate() and applied some conditions using case_when() where the people with Ages less than or equal to 18 are considered children and Ages above 18 are considered Adults.

Here we created a new variable for better understanding and maintaining the original dataset as it is.

R




# Select the Age column and save it in another variable
age_data <- my_data %>% select(Age)
 
# Create a new column "Age_Group" using mutate and case_when
age_data <- age_data %>%
  mutate(Age_Group = case_when(
    Age <= 18 ~ "Child",
    Age > 18 ~ "Adult"
  ))
 
# Print the modified tibble
print(age_data)


Output:

A tibble: 10 × 2
     Age Age_Group
   <dbl> <chr>    
 1    25 Adult    
 2    18 Child    
 3    22 Adult    
 4    NA NA       
 5    35 Adult    
 6    16 Child    
 7    24 Adult    
 8    NA NA       
 9    27 Adult    
10    33 Adult 

Here NA values are considered as NA, people aged 18 and below are considered as Children, and above 18 are considered as Adults. We will handle NA values in the next sections.

The TRUE default argument

TRUE is an argument in the case_when() function and is used as the default case. if all the conditions in the case_when() function are false then this TRUE condition Is applied.

  • Here we have created a new column ‘Is_Child’ based on a condition where people with an age less than or equal to 18 are considered children, and the remaining are considered not children. We applied this condition using the TRUE argument.

R




# Create a new column "Is_Child" using mutate and case_when
age_data <- mutate(age_data,
                  Is_Child = case_when(
                    Age <= 18 ~ "Child",
                    TRUE ~ "Not Child"  # Default case for age >= 18
                  ))
 
# Print the modified tibble
print(age_data)


Output:

A tibble: 10 × 3
     Age Age_Group Is_Child 
   <dbl> <chr>     <chr>    
 1    25 Adult     Not Child
 2    18 Child     Child    
 3    22 Adult     Not Child
 4    NA NA        Not Child
 5    35 Adult     Not Child
 6    16 Child     Child    
 7    24 Adult     Not Child
 8    NA NA        Not Child
 9    27 Adult     Not Child
10    33 Adult     Not Child

Here we used the TRUE argument. People with an age below 18 are considered children, and for NA values, people with an age above 18 are considered not children.

  • We must use the TRUE argument at the end of all the conditions in the case_when() function; otherwise, every element in the output will be considered the value set in the TRUE condition. Here is an example demonstration.

Matching NAs with is.na()

We are making a new condition for NA values in the case_when() function using is.na() function. Here, we have created a new column ‘New_Age_Group’ based on three conditions: people with an age below 18 are considered children, those above 18 are considered adults, and we labeled ‘Age missing’ for NA values.

R




# Create a new column "New_Age_Group" using mutate and case_when
age_data <- age_data %>%
  mutate(New_Age_Group = case_when(
    Age <= 18 ~ "Child",              # Matching ages less than or equal to 18
    Age > 18 ~ "Adult",               # Matching ages greater than 18
    is.na(Age) ~ "Age Missing"        # Matching NA values in the Age column
  ))
 
# Print the modified tibble
print(age_data)


Output:

A tibble: 10 × 4
     Age Age_Group Is_Child  New_Age_Group
   <dbl> <chr>     <chr>     <chr>        
 1    25 Adult     Not Child Adult        
 2    18 Child     Not Child Child        
 3    22 Adult     Not Child Adult        
 4    NA NA        Not Child Age Missing  
 5    35 Adult     Not Child Adult        
 6    16 Child     Not Child Child        
 7    24 Adult     Not Child Adult        
 8    NA NA        Not Child Age Missing  
 9    27 Adult     Not Child Adult        
10    33 Adult     Not Child Adult 

Here you can observe that for NA values it printed as Age Missing and remaining as the condition applied.

Keeping default values of a variable

We can keep the default values of a column and modify specific elements in the column using the TRUE argument. Here, we are creating a new column ‘Education_Level’ using the case_when() function with the Education column, considering masters and Ph.D. as postgraduates, and leaving the remaining values unchanged.

R




# Create a new column "Education_Level" using mutate and case_when
my_data <- my_data %>%
  mutate(Education_Level = case_when(
    Education %in% c("Master's", "PhD") ~ "Post Graduate"
  ))
 
# Print the modified tibble
print(my_data)


Output:

A tibble: 10 × 6
      ID Name      Age Gender Education   Education_Level
   <int> <chr>   <dbl> <chr>  <chr>       <chr>          
 1     1 Alice      25 Female Bachelor's  NA             
 2     2 Bob        18 Male   High School NA             
 3     3 Charlie    22 Male   Bachelor's  NA             
 4     4 David      NA Male   PhD         Post Graduate  
 5     5 Eva        35 Female Master's    Post Graduate  
 6     6 Frank      16 Male   High School NA             
 7     7 Grace      24 Female PhD         Post Graduate  
 8     8 Hank       NA Male   Master's    Post Graduate  
 9     9 Ivy        27 Female Bachelor's  NA             
10    10 Jack       33 Male   PhD         Post Graduate 

In the above example, we categorized both master’s and Ph.D. as postgraduate, while the remaining values were marked as NA because we had not used the TRUE argument yet.

  • Here is an example of using the TRUE function and keeping the default values of a column. We passed the Education variable to the TRUE argument that will set the remaining values to the default values in the Education column.

R




# Create a new column "Education_Level" using mutate and case_when
my_data <- my_data %>%
  mutate(Education_Level = case_when(
    Education %in% c("Master's", "PhD") ~ "Post Graduate"# Matching Master's and PhD
    TRUE ~ Education                                    
  ))
 
# Print the modified tibble
print(my_data)


Output:

A tibble: 10 × 6
      ID Name      Age Gender Education   Education_Level
   <int> <chr>   <dbl> <chr>  <chr>       <chr>          
 1     1 Alice      25 Female Bachelor's  Bachelor's     
 2     2 Bob        18 Male   High School High School    
 3     3 Charlie    22 Male   Bachelor's  Bachelor's     
 4     4 David      NA Male   PhD         Post Graduate  
 5     5 Eva        35 Female Master's    Post Graduate  
 6     6 Frank      16 Male   High School High School    
 7     7 Grace      24 Female PhD         Post Graduate  
 8     8 Hank       NA Male   Master's    Post Graduate  
 9     9 Ivy        27 Female Bachelor's  Bachelor's     
10    10 Jack       33 Male   PhD         Post Graduate  

Here you can observe that all the remaining values are set to the default values in the Education column.

Multiple conditions, Multiple variables

Here, we are applying multiple conditions to multiple variables or columns using the case_when() function. We have defined conditions for the ‘Education’ and ‘Gender’ variables. Males with masters or Ph.D. are categorized as ‘Recruit to male Category’, females with masters or Ph.D. are categorized as ‘Recruit to female Category’, and the default TRUE argument is set to ‘Not recruited.’

R




# Add Recruitment_Category variable using mutate
my_data <- my_data %>%
  mutate(
    Recruitment_Category = case_when(
      Education %in% c("Master's", "PhD") & Gender == "Male" ~ "Recruit to Male Category",
       
      Education %in% c("Master's", "PhD") & Gender == "Female" ~ "Recruit to Female Category",
      TRUE ~ "Not Recruited"  # Default case for other combinations
    )
  )
 
# Print the modified tibble
print(my_data)


Output:

A tibble: 10 × 7
      ID Name      Age Gender Education   Education_Level Recruitment_Category      
   <int> <chr>   <dbl> <chr>  <chr>       <chr>           <chr>                     
 1     1 Alice      25 Female Bachelor's  Bachelor's      Not Recruited             
 2     2 Bob        18 Male   High School High School     Not Recruited             
 3     3 Charlie    22 Male   Bachelor's  Bachelor's      Not Recruited             
 4     4 David      NA Male   PhD         Post Graduate   Recruit to Male Category  
 5     5 Eva        35 Female Master's    Post Graduate   Recruit to Female Category
 6     6 Frank      16 Male   High School High School     Not Recruited             
 7     7 Grace      24 Female PhD         Post Graduate   Recruit to Female Category
 8     8 Hank       NA Male   Master's    Post Graduate   Recruit to Male Category  
 9     9 Ivy        27 Female Bachelor's  Bachelor's      Not Recruited             
10    10 Jack       33 Male   PhD         Post Graduate   Recruit to Male Category 

Order of priority of conditions

In the case_when() function, the priority order of conditions is crucial. To illustrate, consider the example of creating a new column, ‘New_Age_Group’ with conditions based on the ‘age’ column. The priority order is as follows: age below 18 is categorized as a child, below 30 as a younger adult, below 100 as an older adult, and any missing values are labeled as ‘age missing.’

We are following the order of conditions in a hierarchical way.

R




# Create a new column "Age_Group" using mutate and case_when
age_data <- age_data %>%
  mutate(New_Age_Group = case_when(
    Age <= 18 ~ "Child",         
    Age < 30 ~ "Young Adult"
    Age <= 100 ~ "Older Adult"
    is.na(Age) ~ "Age Missing"   
  ))
 
# Print the modified tibble
print(age_data)


Output:

A tibble: 10 × 4
     Age Age_Group Is_Child  New_Age_Group
   <dbl> <chr>     <chr>     <chr>        
 1    25 Adult     Not Child Young Adult  
 2    18 Child     Not Child Child        
 3    22 Adult     Not Child Young Adult  
 4    NA NA        Not Child Age Missing  
 5    35 Adult     Not Child Older Adult  
 6    16 Child     Not Child Child        
 7    24 Adult     Not Child Young Adult  
 8    NA NA        Not Child Age Missing  
 9    27 Adult     Not Child Young Adult  
10    33 Adult     Not Child Older Adult

By altering the order of the conditions, specifically placing the age under 100 conditions at the top, we observe a significant impact on the output. Consequently, all values in the new column are now set to the ‘Older Adult’ category.

Here we have given the highest priority to the condition “Age less than 100” which has led to a faulty case in the output. As a result, all values in the output, except for NA values, are categorized as ‘Older Adult’. To avoid this condition

  • We should write the priority of the conditions perfectly
  • we can use closed bounds to avoid the faulty case

Note: TRUE argument should always be given at the last of the conditions

ifelse() function in mutate()

This is also similar to case_when() where here we include the else statement for the False condition. It is used in the mutate() function to create and modify columns based on the condition. If the condition is TRUE, it is set to a specific value otherwise, it is set to another value.

Syntax:

ifelse(Con, X, Y)

Parameters:

Con: Condition

X: value to be returned if condition is TRUE

Y: value to be returned if condition is FALSE

Here, we are creating a new column ‘Army_Eligibility’ using the ifelse() function. If the height is greater than 165, individuals are considered eligible for the army; otherwise, they are set to not eligible for the army.

R




# Example of using ifelse() in mutate() to create a new column
my_data <- my_data %>%
  mutate(New_Education = ifelse(is.na(Age), "Age Missing",
                                ifelse(Age <= 18, "High School", "College or Higher")))
 
# Print the modified tibble
print(my_data)


Output:

A tibble: 10 × 8
ID Name Age Gender Education Education_Level Recruitment_Category New_Education
<int> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
1 1 Alice 25 Female Bachelor's Bachelor's Not Recruited College or H…
2 2 Bob 18 Male High School High School Not Recruited High School
3 3 Charlie 22 Male Bachelor's Bachelor's Not Recruited College or H…
4 4 David NA Male PhD Post Graduate Recruit to Male Category Age Missing
5 5 Eva 35 Female Master's Post Graduate Recruit to Female Catego… College or H…
6 6 Frank 16 Male High School High School Not Recruited High School
7 7 Grace 24 Female PhD Post Graduate Recruit to Female Catego… College or H…
8 8 Hank NA Male Master's Post Graduate Recruit to Male Category Age Missing
9 9 Ivy 27 Female Bachelor's Bachelor's Not Recruited College or H…
10 10 Jack 33 Male PhD Post Graduate Recruit to Male Category College or H

Conclusion

In conclusion, regarding Conditional Mutate in R, we have two types of functions: case_when() and ifelse(). These functions are used to create and modify columns based on the provided conditions. The case_when() function sets values only if the condition is TRUE, while ifelse() has an additional statement for the FALSE condition, providing flexibility in creating new columns. We learned how to use case_when() and ifelse() functions in mutate() function, we can use multiple conditions on a single variable and multiple variables, and the order of priority should be followed. The TRUE argument should be the last condition to be given. This article covers various topics on Conditional Mutate in R.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads