How to Create Categorical Variables in R?
Last Updated :
19 Dec, 2021
In this article, we will learn how to create categorical variables in the R Programming language.
In statistics, variables can be divided into two categories, i.e., categorical variables and quantitative variables. The variables which consist of numerical quantifiable values are known as quantitative variables and a categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property.
Method 1: Categorical Variable from Scratch
To create a categorical variable from scratch i.e. by giving manual value for each row of data, we use the factor() function and pass the data column that is to be converted into a categorical variable. This factor() function converts the quantitative variable into a categorical variable by grouping the same values together.
Syntax:
df$categorical_variable <- factor( categorical_vector )
where
- df: determines the data frame.
- categorical_variable: determines the final column variable which will contain categorical data.
- categorical_vector: is the vector that has to be converted.
Example:
Here, is a basic data frame where a new column group is added as a categorical variable.
R
df <- data.frame (x= c (10, 23, 13, 41, 15),
y= c (71, 17, 28, 32, 12))
group_vector <- c ( 'A' , 'B' , 'C' , 'D' , 'E' )
df$group <- factor (group_vector)
df
|
Output:
x y group
1 10 71 A
2 23 17 B
3 13 28 C
4 41 32 D
5 15 12 E
Method 2: Categorical Variable from the Existing column using two values
To create a categorical variable from the existing column, we use an if-else statement within the factor() function and give a value to a column if a certain condition is true otherwise give another value.
Syntax:
df$categorical_variable <- as.factor( ifelse(condition, val1, val2) )
where
- df: determines the data frame.
- categorical_variable: determines the final column variable which will contain categorical data.
- condition: determines the condition to be checked, if the condition is true, use val1 otherwise val2.
Example:
Here, is a basic data frame where a new column group is added as a categorical variable from an if-else condition.
R
df <- data.frame (x= c (10, 23, 13, 41, 15),
y= c (71, 17, 28, 32, 12))
df$group <- as.factor ( ifelse (df$x >20, 'A' , 'B' ))
df
|
Output:
x y group
1 10 71 B
2 23 17 A
3 13 28 B
4 41 32 A
5 15 12 B
Method 3: Categorical Variable from the Existing column using multiple values
To create a categorical variable from the existing column, we use multiple if-else statements within the factor() function and give a value to a column if a certain condition is true, if none of the conditions are true we use the else value of the last statement.
Syntax:
df$categorical_variable <- as.factor( ifelse(condition, val,ifelse(condition, val,ifelse(condition, val, ifelse(condition, val, vale_else)))))
where
- df: determines the data frame.
- categorical_variable: determines the final column variable which will contain categorical data.
- condition: determines the condition to be checked, if the condition is true, use val.
- val_else: determines the value if no condition is true.
Example:
Here, is a basic data frame where a new column group is added as a categorical variable from multiple if-else conditions.
R
df <- data.frame (x= c (10, 23, 13, 41, 15, 11, 23, 45, 95, 23, 75),
y= c (71, 17, 28, 32, 12, 13, 41, 15, 11, 23, 34))
df$group <- as.factor ( ifelse (df$x<20, 'A' ,
ifelse (df$x<30, 'B' ,
ifelse (df$x<50, 'C' ,
ifelse (df$x<90, 'D' , 'E' )))))
df
|
Output:
x y group
1 10 71 A
2 23 17 B
3 13 28 A
4 41 32 C
5 15 12 A
6 11 13 A
7 23 41 B
8 45 15 C
9 95 11 E
10 23 23 B
11 75 34 D
Share your thoughts in the comments
Please Login to comment...