Skip to content
Related Articles

Related Articles

Improve Article

Remove rows with empty cells in R

  • Last Updated : 23 May, 2021

A dataframe may contain elements belonging to different data types as cells. However, it may contain blank rows or rows containing missing values in all the columns. These rows are equivalent to dummy records and are termed empty rows. There are multiple ways to remove them. 

Method 1: Removing rows using for loop

A vector is declared to keep the indexes of all the rows containing all blank values. A for loop iteration is done over the rows of the dataframe. A counter is set to 0 to store all blank values in each row. Another iteration is done through columns. The cell value is compared to the blank value, and if it satisfies the condition the counter is incremented. After each inner loop iteration, the counter value is compared to the number of columns in the dataframe. If these values are equivalent, the row index is appended to the vector. After the end of the outer loop, the row indices stored in the vector are deleted using the ‘-‘ in front of the row index vector. 

The time complexity of this approach is O(m *n ), where m is the number of rows and n is the number of columns.

Example:

R






# declaring a dataframe
data_frame = data.frame(col1 = c("","b","","","e") , 
                        col2 = c("",2,"",4,5), 
                        col3= c("",FALSE,"","", TRUE))
  
print ("Original dataframe")
print (data_frame)
  
# declaring an empty vector to store 
# the rows with all the blank values
vec <- c()
  
# looping the rows
for (i in 1:nrow(data_frame)){
    
    # counter for blank values in 
    # each row
    count = 0
      
    # looping through columns
    for(j in 1:ncol(data_frame)){
      
        # checking if the value is blank
        if(isTRUE(data_frame[i,j] == "")){
            count = count + 1
        }
          
    }
    
    # if count is equivalent to number 
    # of columns
    if(count == ncol(data_frame)){
      
          # append row number
        vec <- append(vec,i)
    }
}
  
# deleting rows using index in vector
data_frame_mod <- data_frame[-vec, ] 
print ("Modified dataframe")
print (data_frame_mod)

Output

[1] "Original dataframe"
 col1 col2  col3
1                
2    b    2 FALSE
3                
4         4      
5    e    5  TRUE
[1] "Modified dataframe"
 col1 col2  col3
2    b    2 FALSE
4         4      
5    e    5  TRUE

Method 2: Removing rows with all blank cells in R using apply method

apply() method in R is used to apply a specified function over the R object, vector, dataframe, or a matrix. This method returns a vector or array or list of values obtained by applying the function to the corresponding of an array or matrix.

Syntax: apply(df , axis, FUN, …)

Parameter :

df – A dataframe or matrix

axis – The axis over which to apply the function. For a dataframe, 1 indicates rows, 2 indicates columns and c(1, 2) indicates rows and columns.

FUN – The function to be applied.

The constraint that the dataframe is subjected to is to check that the cell values are not “”, that is blank. In this approach, FUN is equivalent to ‘all’, since all the columns for any particular row should satisfy the condition, of not having a blank cell value. 



Example:

R




# declaring an empty dataframe
data_frame = data.frame(col1 = c("","b","","","e") , 
                        col2 = c("",2,"",4,5), 
                        col3= c("",FALSE,"","", TRUE))
  
print ("Original dataframe")
print (data_frame)
  
# checking where the cells are not all empty
data_frame_mod <- data_frame[!apply(data_frame == "", 1, all), ]  
print ("Modified dataframe")
print (data_frame_mod )

Output

[1] "Original dataframe"
 col1 col2  col3
1                
2    b    2 FALSE
3                
4         4      
5    e    5  TRUE
[1] "Modified dataframe"
 col1 col2  col3
2    b    2 FALSE
4         4      
5    e    5  TRUE

Method 3 : Removing rows with all NA 

A dataframe can consist of missing values or NA contained in replacement to the cell values. This approach uses many inbuilt R methods to remove all the rows with NA. 

  • The number of columns of the dataframe can be checked using the ncol() method.

Syntax:

ncol( df)

  • Individual cell values are checked if the values are NA or not, by using the is.na() method. The dataframe is passed as an argument to this method. It returns a dataframe with dimensions equivalent to the original dataframe. It consists of logical values, TRUE if the value is NA, FALSE otherwise.

Syntax:

na_df <- is.na(df)

  • The rowSums() method is applied over the dataframe consisting of logical values obtained from the previous step. It returns the count of the total sum of NA values encountered in each row. The resultant vector contains the integer denoting a number of missing values of each row.

Syntax:

rowSums(na_df)

  • The rows where the row sum of na values of each row is not equivalent to the number of columns, those rows are stored in a separate variable as an output. If the two are equal, it implies that all columns contain NA in that specific row.

Example:

R




# declaring an empty dataframe
data_frame = data.frame(col1 = c(NA,"b",NA,NA,"e") , 
                        col2 = c(NA,2,NA,4,5), 
                        col3= c(NA,FALSE,NA,NA, TRUE))
  
print ("Original dataframe")
print (data_frame)
  
# checking number of columns
cols <- ncol(data_frame)
  
# checking for which elements have 
# missing values
is_na <- is.na(data_frame)
  
# computes total number of nas 
# encountered in each row
row_na <- rowSums(is_na)
  
# checking where the cells are not 
# all NA
data_frame_mod <- data_frame[row_na != cols, ]  
print ("Modified dataframe")
print (data_frame_mod )

Output

[1] "Original dataframe"
 col1 col2  col3
1 <NA>   NA    NA
2    b    2 FALSE
3 <NA>   NA    NA
4 <NA>    4    NA
5    e    5  TRUE
[1] "Modified dataframe"
 col1 col2  col3
2    b    2 FALSE
4 <NA>    4    NA
5    e    5  TRUE



My Personal Notes arrow_drop_up
Recommended Articles
Page :