Skip to content
Related Articles
Open in App
Not now

Related Articles

Select First Row of Each Group in DataFrame in R

Improve Article
Save Article
Like Article
  • Last Updated : 23 Sep, 2021
Improve Article
Save Article
Like Article

In this article, we will discuss how to select the first row of each group in Dataframe using R programming language.

The duplicated() method is used to determine which of the elements of a dataframe are duplicates of other elements. The method returns a logical vector which tells which of the rows of the dataframe are duplicates. 

Syntax:

duplicated(data-frame$col-name)

The non-duplicated rows are taken using the complemented duplicated() method, and referred to using the dataframe indexing methods, where all the columns are taken and the rows are the ones filtered out using the duplicated() method. The row numbers of the original dataframe are retained in the final output. 

Example: Selecting first row from each group

R




# create first dataframe
data_frame1<-data.frame(col1=c(rep('Grp1',2),
                               rep('Grp2',2),
                               rep('Grp3',2)), 
                        col2=rep(c(1:3),2),
                        col3=rep(1:2,3) 
                        )
  
print("Original DataFrame")
print(data_frame1)
  
print("Modified DataFrame")
  
# computing sum over rest of columns
data_frame1[!duplicated(data_frame1$col1), ]

Output:

[1] "Original DataFrame" 
  col1 col2 col3 
1 Grp1    1    1 
2 Grp1    2    2 
3 Grp2    3    1 
4 Grp2    1    2 
5 Grp3    2    1 
6 Grp3    3    2 
[1] "Modified DataFrame" 
  col1 col2 col3 
1 Grp1    1   1 
3 Grp2    3   1 
5 Grp3    2   1

The following code snippet illustrates the usage of duplicated function, applied over multiple columns. The columns can be clubbed together using the c() method. 

Example: Selecting first row of each group 

R




# create first dataframe
data_frame1<-data.frame(col1=c(rep('Grp1',2),
                               rep('Grp2',2),
                               rep('Grp3',2)), 
                        col2=rep(1,2),
                        col3=rep(1,3) 
                        )
  
print("Original DataFrame")
print(data_frame1)
  
# grouping by col3
print("Modified DataFrame")
  
# computing groups over col3 and col2
data_frame1[!duplicated(c(data_frame1$col3,data_frame1$col2)), ]

Output:

[1] "Original DataFrame" 
col1 col2 col3 
1 Grp1    1    1 
2 Grp1    1    1 
3 Grp2    1    1 
4 Grp2    1    1 
5 Grp3    1    1 
6 Grp3    1    1 
[1] "Modified DataFrame" 
col1 col2 col3 
1 Grp1    1    1

My Personal Notes arrow_drop_up
Like Article
Save Article
Related Articles

Start Your Coding Journey Now!