How to Find and Count Missing Values in R DataFrame

Last Updated : 21 Dec, 2023

In this article, we will be discussing how to find and count missing values in the R programming language.

Find and Count Missing Values in the R DataFrame

Generally, missing values in the given data are represented with NA. In R programming, the missing values can be determined by is.na() method.

This method accepts the data variable as a parameter and determines whether the data point is a missing value or not. To find the location of the missing value use which() method in which is.na() method is passed to which() method.

To count the total number of missing values use the sum() method in which is.na() method is passed.

Let’s look into the syntax of methods that find the location and total count of missing values.

# finds the location of missing values

which(is.na(data))

# finds the count of missing values

sum(is.na(data))

Find and count the Missing values From the entire Data Frame

In order to find the location of missing values and their count from the entire data frame pass the data frame name to the is.na() method. Let’s look into a program for finding and counting the missing values from the entire Data Frame.

R

# create a data frame 
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
                 runs=c(100, 200, 408, NA),
                 wickets=c(17, 20, NA, 5))
 
# find location of missing values
print("Position of missing values ")
which(is.na(stats))
 
# count total missing values 
print("Count of total missing values  ")
sum(is.na(stats))

Output

[1] "Position of missing values "
[1]  8 11

[1] "Count of total missing values  "
[1] 2

In this code we created a Data frame “stats” that holds data of cricketers with few missing values. To determine the location and count of missing values in the given data we used which(is.na(stats)) and sum(is.na(stats)) methods.

Count the number of Missing Values with summary

R

# create a data frame 
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
                    runs=c(NA, 200, 408, NA),
                    wickets=c(17, 20, NA, 8))
 
 
summary(stats)

Output:

    player               runs        wickets    
 Length:4           Min.   :200   Min.   : 8.0  
 Class :character   1st Qu.:252   1st Qu.:12.5  
 Mode  :character   Median :304   Median :17.0  
                    Mean   :304   Mean   :15.0  
                    3rd Qu.:356   3rd Qu.:18.5  
                    Max.   :408   Max.   :20.0  
                    NA's   :2     NA's   :1

Here in each column at last it will shows the number of missing values parsant in each columns.

Count the number of Missing Values with colSums

R

# create a data frame 
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
                    runs=c(NA, 200, 408, NA),
                    wickets=c(17, 20, NA, 8))
 
colSums(is.na(stats))

Output:

 player    runs wickets 
      0       2       1

Find and count the Missing values in one column of a Data Frame

In order to find the location of missing values and their count in one particular column of a data frame pass the dataframeName$columnName to the is.na() method. Let’s look into a program for finding and counting the missing values in the specified column of a Data Frame.

R

# create a data frame 
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
                 runs=c(NA, 200, 408, NA),
                 wickets=c(17, 20, NA, 8))
 
print("Location of missing values in runs column")
which(is.na(stats$runs))
 
 
print("Count of missing values in wickets column")
sum(is.na(stats$wickets))

Output

[1] "Location of missing values in runs column"
[1] 1 4

[1] "Count of missing values in wickets column"
[1] 1

In this code, we will find the location and count of missing values in a certain column. This output indicates that there are missing values in the “runs” column, specifically at positions 1 and 4 (rows 1 and 4).
This output indicates that there is 1 missing value in the “wickets” column.

Find and count missing values in all columns in Data Frame

We can also find the missing values in the data frame column-wise. It reduces the complexity of searching for missing values in the data frame. Let’s look into a sample example program for finding and counting the missing values column-wise.

R

# create a data frame 
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
                 runs=c(100, 200, 408, NA),
                 wickets=c(17, 20, NA, 5))
 
# find location of missing values column wise
print("Position of missing values by column wise")
sapply(stats, function(x) which(is.na(x)))
 
# count the missing values by column wise
print("Count of missing values by column wise")
sapply(stats, function(x) sum(is.na(x)))

Output

"Position of missing values by column wise"
$player
integer(0)
$runs
4
$wickets
3
"Count of missing values by column wise"
player    runs wickets 
     0       1       1

In this code, we will find the position and count of missing values in all the given columns in the dataframe. In order to find the missing values in all columns use apply function with the which and the sum function in is.na() method.

From the output, we can say that-

player column has no missing values.
runs column has 1 missing value at 4^th position.
wickets column has 1 missing value at 3^rd position.

Suggest improvement

How to Select DataFrame Columns by Index in R?

Data Cleaning in R

Share your thoughts in the comments

How to Find and Count Missing Values in R DataFrame

Find and Count Missing Values in the R DataFrame

Find and count the Missing values From the entire Data Frame

R

Count the number of Missing Values with summary

R

Count the number of Missing Values with colSums

R

Find and count the Missing values in one column of a Data Frame

R

Find and count missing values in all columns in Data Frame

R

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?