Open In App

Select First Row of Each Group in DataFrame in R

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss how to select the first row of each group in Dataframe using R programming language.

The duplicated() method is used to determine which of the elements of a dataframe are duplicates of other elements. The method returns a logical vector which tells which of the rows of the dataframe are duplicates. 

Syntax:

duplicated(data-frame$col-name)

The non-duplicated rows are taken using the complemented duplicated() method, and referred to using the dataframe indexing methods, where all the columns are taken and the rows are the ones filtered out using the duplicated() method. The row numbers of the original dataframe are retained in the final output. 

Example: Selecting first row from each group

R




# create first dataframe
data_frame1<-data.frame(col1=c(rep('Grp1',2),
                               rep('Grp2',2),
                               rep('Grp3',2)), 
                        col2=rep(c(1:3),2),
                        col3=rep(1:2,3) 
                        )
  
print("Original DataFrame")
print(data_frame1)
  
print("Modified DataFrame")
  
# computing sum over rest of columns
data_frame1[!duplicated(data_frame1$col1), ]


Output:

[1] "Original DataFrame" 
  col1 col2 col3 
1 Grp1    1    1 
2 Grp1    2    2 
3 Grp2    3    1 
4 Grp2    1    2 
5 Grp3    2    1 
6 Grp3    3    2 
[1] "Modified DataFrame" 
  col1 col2 col3 
1 Grp1    1   1 
3 Grp2    3   1 
5 Grp3    2   1

The following code snippet illustrates the usage of duplicated function, applied over multiple columns. The columns can be clubbed together using the c() method. 

Example: Selecting first row of each group 

R




# create first dataframe
data_frame1<-data.frame(col1=c(rep('Grp1',2),
                               rep('Grp2',2),
                               rep('Grp3',2)), 
                        col2=rep(1,2),
                        col3=rep(1,3) 
                        )
  
print("Original DataFrame")
print(data_frame1)
  
# grouping by col3
print("Modified DataFrame")
  
# computing groups over col3 and col2
data_frame1[!duplicated(c(data_frame1$col3,data_frame1$col2)), ]


Output:

[1] "Original DataFrame" 
col1 col2 col3 
1 Grp1    1    1 
2 Grp1    1    1 
3 Grp2    1    1 
4 Grp2    1    1 
5 Grp3    1    1 
6 Grp3    1    1 
[1] "Modified DataFrame" 
col1 col2 col3 
1 Grp1    1    1


Last Updated : 23 Sep, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads