Open In App

How to Create Categorical Variables in R?

Last Updated : 19 Dec, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will learn how to create categorical variables in the R Programming language.

In statistics, variables can be divided into two categories, i.e., categorical variables and quantitative variables. The variables which consist of numerical quantifiable values are known as quantitative variables and a categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property.

Method 1: Categorical Variable from Scratch

To create a categorical variable from scratch i.e. by giving manual value for each row of data, we use the factor() function and pass the data column that is to be converted into a categorical variable. This factor() function converts the quantitative variable into a categorical variable by grouping the same values together.

Syntax:

df$categorical_variable <- factor( categorical_vector )

where

  • df: determines the data frame.
  • categorical_variable: determines the final column variable which will contain categorical data.
  • categorical_vector: is the vector that has to be converted.

Example:

Here, is a basic data frame where a new column group is added as a categorical variable.

R




# create sample data frame
df <- data.frame(x=c(10, 23, 13, 41, 15),
                 y=c(71, 17, 28, 32, 12))
  
# create categorical vector
group_vector <- c('A','B','C','D','E')
  
# Add categorical variable to the data frame
df$group <- factor(group_vector)
  
# print data frame
df


Output:

   x  y group
1 10 71     A
2 23 17     B
3 13 28     C
4 41 32     D
5 15 12     E

Method 2: Categorical Variable from the Existing column using two values

To create a categorical variable from the existing column, we use an if-else statement within the factor() function and give a value to a column if a certain condition is true otherwise give another value.

Syntax:

 df$categorical_variable <- as.factor( ifelse(condition, val1, val2) )

where

  • df: determines the data frame.
  • categorical_variable: determines the final column variable which will contain categorical data.
  • condition: determines the condition to be checked, if the condition is true, use val1 otherwise val2.

Example:

Here, is a basic data frame where a new column group is added as a categorical variable from an if-else condition.

R




# create sample data frame
df <- data.frame(x=c(10, 23, 13, 41, 15),
                 y=c(71, 17, 28, 32, 12))
  
# Add categorical variable to the data frame
df$group <- as.factor(ifelse(df$x >20, 'A', 'B'))
  
# print data frame
df


Output:

   x  y group
1 10 71     B
2 23 17     A
3 13 28     B
4 41 32     A
5 15 12     B

Method 3: Categorical Variable from the Existing column using multiple values

To create a categorical variable from the existing column, we use multiple if-else statements within the factor() function and give a value to a column if a certain condition is true, if none of the conditions are true we use the else value of the last statement.

Syntax:

df$categorical_variable <- as.factor( ifelse(condition, val,ifelse(condition, val,ifelse(condition, val, ifelse(condition, val, vale_else)))))

where

  • df: determines the data frame.
  • categorical_variable: determines the final column variable which will contain categorical data.
  • condition: determines the condition to be checked, if the condition is true, use val.
  • val_else: determines the value if no condition is true.

Example:

Here, is a basic data frame where a new column group is added as a categorical variable from multiple if-else conditions.

R




# create sample data frame
df <- data.frame(x=c(10, 23, 13, 41, 15, 11, 23, 45, 95, 23, 75),
                 y=c(71, 17, 28, 32, 12, 13, 41, 15, 11, 23, 34))
  
# Add categorical variable to the data frame
df$group <- as.factor(ifelse(df$x<20, 'A',
                     ifelse(df$x<30, 'B',
                     ifelse(df$x<50, 'C',
                     ifelse(df$x<90, 'D', 'E')))))
  
# print data frame
df


Output:

    x  y group
1  10 71     A
2  23 17     B
3  13 28     A
4  41 32     C
5  15 12     A
6  11 13     A
7  23 41     B
8  45 15     C
9  95 11     E
10 23 23     B
11 75 34     D


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads