How to Find and Count Missing Values in R DataFrame
In this article, we will be discussing how to find and count missing values in the R programming language.
Generally, missing values in the given data is represented with NA. In R programming, the missing values can be determined by is.na() method. This method accepts the data variable as a parameter and determines whether the data point is a missing value or not. To find the location of the missing value use which() method in which is.na() method is passed to which() method. To count the total number of missing values use sum() method in which is.na() method is passed.
Let’s look into the syntax of methods that finds the location and total count of missing values-
# finds the location of missing values
which(is.na(data))
# finds the count of missing values
sum(is.na(data))
Find and count the Missing values From the entire Data Frame:
In order to find the location of missing values and their count from the entire data frame pass the data frame name to the is.na() method. Let’s look into a program for finding and counting the missing values from the entire Data Frame.
Example:
In the below code we created a Data frame “stats” that holds data of cricketers with few missing values. To determine the location and count of missing values in the given data we used which(is.na(stats)) and sum(is.na(stats)) methods.
R
# create a data frame stats <- data.frame (player= c ( 'A' , 'B' , 'C' , 'D' ), runs= c (100, 200, 408, NA ), wickets= c (17, 20, NA , 5)) # find location of missing values print ( "Position of missing values -" ) which ( is.na (stats)) # count total missing values print ( "Count of total missing values - " ) sum ( is.na (stats)) |
Output
Position of missing values - 8 11 Count of total missing values - 2
Find and count the Missing values in one column of a Data Frame:
In order to find the location of missing values and their count in one particular column of a data frame pass the dataframeName$columnName to the is.na() method. Let’s look into a program for finding and counting the missing values in the specified column of a Data Frame.
Example:
In this code, we will find the location and count of missing values in a certain column. In order to find the missing values in a certain column append $columnName to the data frame name in is.na() method.
R
# create a data frame stats <- data.frame (player= c ( 'A' , 'B' , 'C' , 'D' ), runs= c ( NA , 200, 408, NA ), wickets= c (17, 20, NA , 8)) print ( "Location of missing values in runs column" ) which ( is.na (stats$runs)) print ( "Count of missing values in wickets column" ) sum ( is.na (stats$wickets)) |
Output
"Location of missing values in runs column" 1 4 "Count of missing values in wickets column" 1
Find and count missing values in all columns in Data Frame:
We can also find the missing values in the data frame column-wise. It reduces the complexity of searching for missing values in the data frame. Let’s look into a sample example program for finding and counting the missing values column-wise.
Example:
In this code, we will find the position and count of missing values in all the given columns in the dataframe. In order to find the missing values in all columns use apply function with the which and the sum function in is.na() method.
R
# create a data frame stats <- data.frame (player= c ( 'A' , 'B' , 'C' , 'D' ), runs= c (100, 200, 408, NA ), wickets= c (17, 20, NA , 5)) # find location of missing values column wise print ( "Position of missing values by column wise" ) sapply (stats, function (x) which ( is.na (x))) # count the missing values by column wise print ( "Count of missing values by column wise" ) sapply (stats, function (x) sum ( is.na (x))) |
Output
"Position of missing values by column wise" $player integer(0) $runs 4 $wickets 3 "Count of missing values by column wise" player runs wickets 0 1 1
From the output, we can say that-
- player column has no missing values.
- runs column has 1 missing value at 4th position.
- wickets column has 1 missing value at 3rd position.
Please Login to comment...