Open In App

How to Find and Count Missing Values in R DataFrame

Last Updated : 21 Dec, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will be discussing how to find and count missing values in the R programming language.

Find and Count Missing Values in the R DataFrame

Generally, missing values in the given data are represented with NA. In R programming, the missing values can be determined by is.na() method.

This method accepts the data variable as a parameter and determines whether the data point is a missing value or not. To find the location of the missing value use which() method in which is.na() method is passed to which() method.

To count the total number of missing values use the sum() method in which is.na() method is passed.

Let’s look into the syntax of methods that find the location and total count of missing values.

# finds the location of missing values

which(is.na(data))

# finds the count of missing values 

sum(is.na(data))

Find and count the Missing values From the entire Data Frame

In order to find the location of missing values and their count from the entire data frame pass the data frame name to the is.na() method. Let’s look into a program for finding and counting the missing values from the entire Data Frame.

R




# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
                 runs=c(100, 200, 408, NA),
                 wickets=c(17, 20, NA, 5))
 
# find location of missing values
print("Position of missing values ")
which(is.na(stats))
 
# count total missing values
print("Count of total missing values  ")
sum(is.na(stats))


Output

[1] "Position of missing values "
[1] 8 11

[1] "Count of total missing values "
[1] 2

In this code we created a Data frame “stats” that holds data of cricketers with few missing values. To determine the location and count of missing values in the given data we used which(is.na(stats)) and sum(is.na(stats)) methods.

Count the number of Missing Values with summary

R




# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
                    runs=c(NA, 200, 408, NA),
                    wickets=c(17, 20, NA, 8))
 
 
summary(stats)


Output:

    player               runs        wickets    
Length:4 Min. :200 Min. : 8.0
Class :character 1st Qu.:252 1st Qu.:12.5
Mode :character Median :304 Median :17.0
Mean :304 Mean :15.0
3rd Qu.:356 3rd Qu.:18.5
Max. :408 Max. :20.0
NA's :2 NA's :1

Here in each column at last it will shows the number of missing values parsant in each columns.

Count the number of Missing Values with colSums

R




# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
                    runs=c(NA, 200, 408, NA),
                    wickets=c(17, 20, NA, 8))
 
colSums(is.na(stats))


Output:

 player    runs wickets 
0 2 1

Find and count the Missing values in one column of a Data Frame

In order to find the location of missing values and their count in one particular column of a data frame pass the dataframeName$columnName to the is.na() method.  Let’s look into a program for finding and counting the missing values in the specified column of a Data Frame.

R




# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
                 runs=c(NA, 200, 408, NA),
                 wickets=c(17, 20, NA, 8))
 
print("Location of missing values in runs column")
which(is.na(stats$runs))
 
 
print("Count of missing values in wickets column")
sum(is.na(stats$wickets))


Output

[1] "Location of missing values in runs column"
[1] 1 4

[1] "Count of missing values in wickets column"
[1] 1

In this code, we will find the location and count of missing values in a certain column. This output indicates that there are missing values in the “runs” column, specifically at positions 1 and 4 (rows 1 and 4).
This output indicates that there is 1 missing value in the “wickets” column.

Find and count missing values in all columns in Data Frame

We can also find the missing values in the data frame column-wise. It reduces the complexity of searching for missing values in the data frame. Let’s look into a sample example program for finding and counting the missing values column-wise.

R




# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D'),
                 runs=c(100, 200, 408, NA),
                 wickets=c(17, 20, NA, 5))
 
# find location of missing values column wise
print("Position of missing values by column wise")
sapply(stats, function(x) which(is.na(x)))
 
# count the missing values by column wise
print("Count of missing values by column wise")
sapply(stats, function(x) sum(is.na(x)))


Output

"Position of missing values by column wise"
$player
integer(0)
$runs
4
$wickets
3
"Count of missing values by column wise"
player runs wickets
0 1 1

In this code, we will find the position and count of missing values in all the given columns in the dataframe. In order to find the missing values in all columns use apply function with the which and the sum function in is.na() method.

From the output, we can say that-

  • player column has no missing values.
  • runs column has 1 missing value at 4th position.
  • wickets column has 1 missing value at 3rd position.


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads