Open In App

R Factors

Factors in R Programming Language are data structures that are implemented to categorize the data or represent categorical data and store it on multiple levels. 

They can be stored as integers with a corresponding label to every unique integer. The R factors may look similar to character vectors, they are integers and care must be taken while using them as strings. The R factor accepts only a restricted number of distinct values. For example, a data field such as gender may contain values only from female, male, or transgender.



In the above example, all the possible cases are known beforehand and are predefined. These distinct values are known as levels. After a factor is created it only consists of levels that are by default sorted alphabetically.  

Attributes of Factors in R Language



Creating a Factor in R Programming Language

The command used to create or modify a factor in R language is – factor() with a vector as input. 
The two steps to creating an R factor :  

Examples: Let us create a factor gender with levels female, male and transgender.  




# Creating a vector
x <-c("female", "male", "male", "female")
print(x)
 
# Converting the vector x into a factor
# named gender
gender <-factor(x)
print(gender)

Output 

[1] "female" "male"   "male"   "female"
[1] female male   male   female
Levels: female male

Levels can also be predefined by the programmer. 




# Creating a factor with levels defined by programmer
gender <- factor(c("female", "male", "male", "female"),
          levels = c("female", "transgender", "male"));
gender

Output 

[1] female male   male   female
Levels: female transgender male

Further one can check the levels of a factor by using function levels()

Checking for a Factor in R

The function is.factor() is used to check whether the variable is a factor and returns “TRUE” if it is a factor. 




gender <- factor(c("female", "male", "male", "female"));
print(is.factor(gender))

Output 

[1] TRUE

Function class() is also used to check whether the variable is a factor and if true returns “factor”. 




gender <- factor(c("female", "male", "male", "female"));
class(gender)

Output 

[1] "factor" 

Accessing elements of a Factor in R

Like we access elements of a vector, the same way we access the elements of a factor. If gender is a factor then gender[i] would mean accessing an ith element in the factor. 

Example  




gender <- factor(c("female", "male", "male", "female"));
gender[3]

Output 

[1] male
Levels: female male

More than one element can be accessed at a time. 

Example 




gender <- factor(c("female", "male", "male", "female"));
gender[c(2, 4)]

Output 

[1] male   female
Levels: female male

Subtract one element at a time. 

Example 




gender <- factor(c("female", "male", "male", "female"  ));
gender[-3]

Output 

[1] female male   female
Levels: female male

Modification of a Factor in R

After a factor is formed, its components can be modified but the new values which need to be assigned must be at the predefined level. 

Example  




gender <- factor(c("female", "male", "male", "female"  ));
gender[2]<-"female"
gender

Output 

[1] female female male   female
Levels: female male

For selecting all the elements of the factor gender except ith element, gender[-i] should be used. So if you want to modify a factor and add value out of predefined levels, then first modify levels. 

Example  




gender <- factor(c("female", "male", "male", "female"  ));
 
# add new level
levels(gender) <- c(levels(gender), "other")   
gender[3] <- "other"
gender

Output

[1] female male   other  female
Levels: female male other 

Factors in Data Frame 

The Data frame is similar to a 2D array with the columns containing all the values of one variable and the rows having one set of values from every column. There are four things to remember about data frames:  

In R language when we create a data frame, its column is categorical data, and hence a R factor is automatically created on it.
We can create a data frame and check if its column is a factor. 

Example  




age <- c(40, 49, 48, 40, 67, 52, 53) 
salary <- c(103200, 106200, 150200,
            10606, 10390, 14070, 10220)
gender <- c("male", "male", "transgender",
            "female", "male", "female", "transgender")
employee<- data.frame(age, salary, gender) 
print(employee) 
print(is.factor(employee$gender))

Output

  age salary      gender
1  40 103200        male
2  49 106200        male
3  48 150200 transgender
4  40  10606      female
5  67  10390        male
6  52  14070      female
7  53  10220 transgender
[1] TRUE

Article Tags :