Select First Row of Each Group in DataFrame in R

Last Updated : 23 Sep, 2021

In this article, we will discuss how to select the first row of each group in Dataframe using R programming language.

The duplicated() method is used to determine which of the elements of a dataframe are duplicates of other elements. The method returns a logical vector which tells which of the rows of the dataframe are duplicates.

Syntax:

duplicated(data-frame$col-name)

The non-duplicated rows are taken using the complemented duplicated() method, and referred to using the dataframe indexing methods, where all the columns are taken and the rows are the ones filtered out using the duplicated() method. The row numbers of the original dataframe are retained in the final output.

Example: Selecting first row from each group

R

# create first dataframe 
data_frame1<-data.frame(col1=c(rep('Grp1',2), 
                               rep('Grp2',2), 
                               rep('Grp3',2)),  
                        col2=rep(c(1:3),2), 
                        col3=rep(1:2,3)  
                        ) 
  
print("Original DataFrame") 
print(data_frame1) 
  
print("Modified DataFrame") 
  
# computing sum over rest of columns 
data_frame1[!duplicated(data_frame1$col1), ]

Output:

[1] "Original DataFrame" 
  col1 col2 col3 
1 Grp1    1    1 
2 Grp1    2    2 
3 Grp2    3    1 
4 Grp2    1    2 
5 Grp3    2    1 
6 Grp3    3    2 
[1] "Modified DataFrame" 
  col1 col2 col3 
1 Grp1    1   1 
3 Grp2    3   1 
5 Grp3    2   1

The following code snippet illustrates the usage of duplicated function, applied over multiple columns. The columns can be clubbed together using the c() method.

Example: Selecting first row of each group

R

# create first dataframe 
data_frame1<-data.frame(col1=c(rep('Grp1',2), 
                               rep('Grp2',2), 
                               rep('Grp3',2)),  
                        col2=rep(1,2), 
                        col3=rep(1,3)  
                        ) 
  
print("Original DataFrame") 
print(data_frame1) 
  
# grouping by col3 
print("Modified DataFrame") 
  
# computing groups over col3 and col2 
data_frame1[!duplicated(c(data_frame1$col3,data_frame1$col2)), ]

Output:

[1] "Original DataFrame" 
col1 col2 col3 
1 Grp1    1    1 
2 Grp1    1    1 
3 Grp2    1    1 
4 Grp2    1    1 
5 Grp3    1    1 
6 Grp3    1    1 
[1] "Modified DataFrame" 
col1 col2 col3 
1 Grp1    1    1

Suggest improvement

Extract first N rows from dataframe in R

Share your thoughts in the comments

Select First Row of Each Group in DataFrame in R

R

R

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?