Open In App
Related Articles

Select First Row of Each Group in DataFrame in R

Improve Article
Improve
Save Article
Save
Like Article
Like

In this article, we will discuss how to select the first row of each group in Dataframe using R programming language.

The duplicated() method is used to determine which of the elements of a dataframe are duplicates of other elements. The method returns a logical vector which tells which of the rows of the dataframe are duplicates. 

Syntax:

duplicated(data-frame$col-name)

The non-duplicated rows are taken using the complemented duplicated() method, and referred to using the dataframe indexing methods, where all the columns are taken and the rows are the ones filtered out using the duplicated() method. The row numbers of the original dataframe are retained in the final output. 

Example: Selecting first row from each group

R




# create first dataframe
data_frame1<-data.frame(col1=c(rep('Grp1',2),
                               rep('Grp2',2),
                               rep('Grp3',2)), 
                        col2=rep(c(1:3),2),
                        col3=rep(1:2,3) 
                        )
  
print("Original DataFrame")
print(data_frame1)
  
print("Modified DataFrame")
  
# computing sum over rest of columns
data_frame1[!duplicated(data_frame1$col1), ]


Output:

[1] "Original DataFrame" 
  col1 col2 col3 
1 Grp1    1    1 
2 Grp1    2    2 
3 Grp2    3    1 
4 Grp2    1    2 
5 Grp3    2    1 
6 Grp3    3    2 
[1] "Modified DataFrame" 
  col1 col2 col3 
1 Grp1    1   1 
3 Grp2    3   1 
5 Grp3    2   1

The following code snippet illustrates the usage of duplicated function, applied over multiple columns. The columns can be clubbed together using the c() method. 

Example: Selecting first row of each group 

R




# create first dataframe
data_frame1<-data.frame(col1=c(rep('Grp1',2),
                               rep('Grp2',2),
                               rep('Grp3',2)), 
                        col2=rep(1,2),
                        col3=rep(1,3) 
                        )
  
print("Original DataFrame")
print(data_frame1)
  
# grouping by col3
print("Modified DataFrame")
  
# computing groups over col3 and col2
data_frame1[!duplicated(c(data_frame1$col3,data_frame1$col2)), ]


Output:

[1] "Original DataFrame" 
col1 col2 col3 
1 Grp1    1    1 
2 Grp1    1    1 
3 Grp2    1    1 
4 Grp2    1    1 
5 Grp3    1    1 
6 Grp3    1    1 
[1] "Modified DataFrame" 
col1 col2 col3 
1 Grp1    1    1

Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now!

Last Updated : 23 Sep, 2021
Like Article
Save Article
Similar Reads
Related Tutorials