Skip to content
Related Articles

Related Articles

Improve Article
Save Article
Like Article

The Factor Issue in a DataFrame in R Programming

  • Last Updated : 10 May, 2020

DataFrames are generic data objects of R which are used to store the tabular data. Data frames are considered to be the most popular data objects in R programming because it is more comfortable to analyze the data in the tabular form. Data frames can also be taught as matrices where each column of a matrix can be of the different data types.

Factor issue in a data frame in R

R has the inbuilt characteristics to assign the data types to the data you enter. When you enter numeric variables, it knows all the numeric variables that are available but when you enter character variables it takes whatever the character variables you are giving as categories or factors levels. And it assumes that these are the only factors that are available for now. Factor variables are those where the character column is split into categories or factor levels. So let’s understand this through an example. In the below R code there given a data frame and we want to manipulate the data frame and take a look, what’s the problem actually happening here.

Example:




# R program to illustrate
# the factor issue in a data frame
  
# Creating a dataframe 
df = data.frame( 
  "Name" = c("Amiya", "Raj", "Asish"), 
  "Language" = c("R", "Python", "Java"), 
  "Age" = c(22, 25, 45
print(df) 
  
# Manipulating the data frame
df[1, 3] = 37
df[3, 2] = "C"
  
print(df)

Output:

Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

   Name Language Age
1 Amiya        R  37
2   Raj   Python  25
3 Asish     NA    45
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = "C") :
  invalid factor level, NA generated

At first, When you want to change the element in the first-row third column to others the operations performed successfully though it was a numeric variable. But when you want to change the element in the third-row second column to others; what happens is, it will display a warning message saying that this “C” categorical variable is not available and it replaces that with the NA. You can notice that the place where we want “C” to be there we are having a NA and we can also see the use of word factor in the warning message, how to get rid of the factor issue is the question now.



Resolving the factor issue

New entries in R when you are entering should be consistent with the factor levels that are already defined and if not, those error messages will be printed out. If you do not want this issue to happen what you have to do is while defining the data frame itself you need to pass another argument, which says “strings as factors” is false. By default this argument is true that is the reason why you get this warning message when you want to change the string characters into new string characters as an element. Now try doing the same manipulation you want to change.

Example:




# R program to illustrate
# resolving the factor issue in a data frame
  
# Creating a dataframe 
df = data.frame( 
  "Name" = c("Amiya", "Raj", "Asish"), 
  "Language" = c("R", "Python", "Java"), 
  "Age" = c(22, 25, 45),
  # Passing an additional argument 
  # to resolve factor issue
  stringsAsFactors = F
print(df) 
  
# Manipulating the data frame
df[1, 3] = 37
df[3, 2] = "C"
  
print(df)

Output:

Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

   Name Language Age
1 Amiya        R  37
2   Raj   Python  25
3 Asish        C  45

From the above code, you can see that there is no NA anymore and we achieved what we want.




My Personal Notes arrow_drop_up

Start Your Coding Journey Now!