Handling Missing Values in R Programming
Last Updated :
01 Aug, 2023
As the name indicates, Missing values are those elements that are not known. NA or NaN are reserved words that indicate a missing value in R Programming language for q arithmetical operations that are undefined.
R – Handling Missing Values
Missing values are practical in life. For example, some cells in spreadsheets are empty. If an insensible or impossible arithmetic operation is tried then NAs occur.
Dealing Missing Values in R
Missing Values in R, are handled with the use of some pre-defined functions:
is.na() Function for Finding Missing values:
A logical vector is returned by this function that indicates all the NA values present. It returns a Boolean value. If NA is present in a vector it returns TRUE else FALSE.
R
x<- c ( NA , 3, 4, NA , NA , NA )
is.na (x)
|
Output:
[1] TRUE FALSE FALSE TRUE TRUE TRUE
Properties of Missing Values:
- For testing objects that are NA use is.na()
- For testing objects that are NaN use is.nan()
- There are classes under which NA comes. Hence integer class has integer type NA, the character class has character type NA, etc.
- A NaN value is counted in NA but the reverse is not valid.
The creation of a vector with one or multiple NAs is also possible.
R
x<- c ( NA , 3, 4, NA , NA , NA )
x
|
Output:
[1] NA 3 4 NA NA NA
Removing NA or NaN values
There are two ways to remove missing values:
Extracting values except for NA or NaN values:
Example 1:
R
x <- c (1, 2, NA , 3, NA , 4)
d <- is.na (x)
x[! d]
|
Output:
[1] 1 2 3 4
Example 2:
R
x <- c (1, 2, 0 / 0, 3, NA , 4, 0 / 0)
x
x[! is.na (x)]
|
Output:
[1] 1 2 NaN 3 NA 4 NaN
[1] 1 2 3 4
A function called complete.cases() can also be used. This function also works on data frames.
Missing Value Filter Functions
The modeling functions in R language acknowledge a na.action argument which provides instructions to the function regarding its response if NA comes in its way.
And hence this way the function calls one of the missing value filter functions. Missing Value Filter Functions alter the data set and in the new data set the value of NAs has been changed. The default Missing Value Filter Function is na.omit. It omits every row containing even one NA. Some other Missing Value Filter Functions are:
- na.omit– omits every row containing even one NA
- na.fail– halts and does not proceed if NA is encountered
- na.exclude– excludes every row containing even one NA but keeps a record of their original position
- na.pass– it just ignores NA and passes through it
R
df <- data.frame (c1 = 1:8,
c2 = factor ( c ( "B" , "A" , "B" , "C" ,
"A" , "C" , "B" , "A" )))
df[4, 1] <- df[6, 2] <- NA
levels (df$c2)
na.fail (df)
na.exclude (a)
|
Output:
[1] "A" "B" "C"
Error in na.fail.default(df) : missing values in object
Calls: na.fail -> na.fail.default
Execution halted
Find and Remove NA or NaN values from a dataset
In R we can remove and find missing values from the entire dataset. there are some main functions we can use and perform the tasks.
First, we will create one data frame and then we will find and remove all the missing values which are present in the data.
R
data <- data.frame (
A = c (1, 2, NA , 4, 5),
B = c ( NA , 2, 3, NA , 5),
C = c (1, 2, 3, NA , NA )
)
data
|
Output:
A B C
1 1 NA 1
2 2 2 2
3 NA 3 3
4 4 NA NA
5 5 5 NA
Find all the missing values in the data
Output:
[1] 5
Find all the missing values in the columns
Output:
A B C
1 2 2
Visualization of missing values of a dataset
R
install.packages ( "visdat" )
library (visdat)
data <- data.frame (
A = c (1, NA , 3, NA , 5),
B = c ( NA , 2, NA , 4, NA ),
C = c (1, 2, 3, NA , NA )
)
vis_miss (data)
|
Output:
Handling missing values in R
Remove missing values from dataframe
R
data<- na.omit (data)
data
|
Output:
A B C
2 2 2 2
Special Cases
There are two special cases where NA is denoted or presented differently:
- Factor Vectors– is the symbol displayed in factor vectors for missing values.
- NaN – This is a special case of NA only. It is displayed when an arithmetic operation yields a result that is not a number. For example, dividing zero by zero produces NaN.
Share your thoughts in the comments
Please Login to comment...