Dummy Variables in R Programming

R programming is one of the most used languages for data mining and visualization of the data. Using this language, any type of machine learning algorithm can be processed like regression, classification, etc. Dummy coding is used in regression analysis for categorizing the variable. Dummy variable in R programming is a type of variable that represents a characteristic of an experiment. A dummy variable is either 1 or 0 and 1 can be represented as either True or False and 0 can be represented as False or True depending upon the user. This variable is used to categorize the characteristic of an observation. For example, a person is either male or female, discipline is either good or bad, etc. Further, new columns will be made accordingly which will specify if the person is male or not as the binary value of gender_m and if the person is female or not as the binary value of gender_f.
Original dataframe:
dummy varibales
After creating dummy variable:
dummy varibales

In this article, let us discuss to create dummy variables in R using 2 methods i.e., ifelse() method and another is by using dummy_cols() function.

Using ifelse() function

ifelse() function performs a test and based on the result of the test return true value or false value as provided in the parameters of the function. Using this function, dummy variable can be created accordingly.

Syntax:
ifelse(test, yes, no)

Parameters:
test: represents test condition
yes: represents the value which will be executed if test condition satisfies
no: represents the value which will be executed if test condition does not satisfies



Example 1:

filter_none

edit
close

play_arrow

link
brightness_4
code

# Using PlantGrowth dataset
pg <- PlantGrowth
  
# Print
cat("Original dataset:\n")
head(pg, 20)
  
# Create dummy variable
pg$group_ctr1 <- ifelse(pg$group == "ctrl", 1, 0)
  
# Print
cat("After creating dummy variable:\n")
head(pg, 20)

chevron_right


Output:

Original dataset:
   weight group
1    4.17  ctrl
2    5.58  ctrl
3    5.18  ctrl
4    6.11  ctrl
5    4.50  ctrl
6    4.61  ctrl
7    5.17  ctrl
8    4.53  ctrl
9    5.33  ctrl
10   5.14  ctrl
11   4.81  trt1
12   4.17  trt1
13   4.41  trt1
14   3.59  trt1
15   5.87  trt1
16   3.83  trt1
17   6.03  trt1
18   4.89  trt1
19   4.32  trt1
20   4.69  trt1

After creating dummy variable:
   weight group group_ctr1
1    4.17  ctrl          1
2    5.58  ctrl          1
3    5.18  ctrl          1
4    6.11  ctrl          1
5    4.50  ctrl          1
6    4.61  ctrl          1
7    5.17  ctrl          1
8    4.53  ctrl          1
9    5.33  ctrl          1
10   5.14  ctrl          1
11   4.81  trt1          0
12   4.17  trt1          0
13   4.41  trt1          0
14   3.59  trt1          0
15   5.87  trt1          0
16   3.83  trt1          0
17   6.03  trt1          0
18   4.89  trt1          0
19   4.32  trt1          0
20   4.69  trt1          0

Example 2:

filter_none

edit
close

play_arrow

link
brightness_4
code

# Create a dataframe
df <- data.frame(gender = c("m", "f", "m"),
                 age = c(19, 20, 20),
                 city = c("Delhi", "Mumbai"
                                   "Delhi"))
  
# Print original dataset
print(df)
  
# Create dummy variable
df$gender_m <- ifelse(df$gender == "m", 1, 0)
df$gender_f <- ifelse(df$gender == "f", 1, 0)
  
# Print resultant
print(df)

chevron_right


Output:

  gender age   city
1      m  19  Delhi
2      f  20 Mumbai
3      m  20  Delhi

  gender age   city gender_m gender_f
1      m  19  Delhi        1        0
2      f  20 Mumbai        0        1
3      m  20  Delhi        1        0

Using dummy_cols() function

dummy_cols() function is present in fastDummies package. It creates dummy variables on the basis of parameters provided in the function. If columns are not selected in the function call for which dummy variable has to be created, then dummy variables are created for all characters and factors column in the dataframe.

Syntax:
dummy_cols(.data, select_columns = NULL)

Parameters:
.data: represents object for which dummy columns has to be created
select_columns: represents columns for which dummy variables has to be created

Example 1:

filter_none

edit
close

play_arrow

link
brightness_4
code

# Install the required package
install.packages("fastDummies")
  
# Load the library
library(fastDummies)
  
# Using PlantGrowth dataset
data <- PlantGrowth
  
# Create dummy variable
data <- dummy_cols(data, 
                   select_columns = "group")
  
# Print
print(data)

chevron_right


Output:

   weight group group_ctrl group_trt1 group_trt2
1    4.17  ctrl          1          0          0
2    5.58  ctrl          1          0          0
3    5.18  ctrl          1          0          0
4    6.11  ctrl          1          0          0
5    4.50  ctrl          1          0          0
6    4.61  ctrl          1          0          0
7    5.17  ctrl          1          0          0
8    4.53  ctrl          1          0          0
9    5.33  ctrl          1          0          0
10   5.14  ctrl          1          0          0
11   4.81  trt1          0          1          0
12   4.17  trt1          0          1          0
13   4.41  trt1          0          1          0
14   3.59  trt1          0          1          0
15   5.87  trt1          0          1          0
16   3.83  trt1          0          1          0
17   6.03  trt1          0          1          0
18   4.89  trt1          0          1          0
19   4.32  trt1          0          1          0
20   4.69  trt1          0          1          0
21   6.31  trt2          0          0          1
22   5.12  trt2          0          0          1
23   5.54  trt2          0          0          1
24   5.50  trt2          0          0          1
25   5.37  trt2          0          0          1
26   5.29  trt2          0          0          1
27   4.92  trt2          0          0          1
28   6.15  trt2          0          0          1
29   5.80  trt2          0          0          1
30   5.26  trt2          0          0          1

Example 2:

filter_none

edit
close

play_arrow

link
brightness_4
code

# Create a dataframe
df <- data.frame(gender = c("m", "f", "m"),
                 age = c(19, 20, 20),
                 city = c("Delhi", "Mumbai"
                                  "Delhi"))
  
# Create dummy variables
# select_columns = NULL uses all 
# character and factor columns
# to create dummy variable
df <- dummy_cols(df)
  
# Print
print(df)

chevron_right


Output:

  gender age   city gender_f gender_m city_Delhi city_Mumbai
1      m  19  Delhi        0        1          1           0
2      f  20 Mumbai        1        0          0           1
3      m  20  Delhi        0        1          1           0



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.