dplyr Package in R Programming

The dplyr package in R is a structure of data manipulation that provides a uniform set of verbs, helping to resolve the most frequent data manipulation hurdles.

The dplyr package performs the steps given below quicker and in an easier fashion:

  • By limiting the choices the focus can now be more on data manipulation difficulties.
  • There are uncomplicated “verbs”, functions present for tackling every common data manipulation and the thoughts can be translated into code faster.
  • There are valuable backends and hence waiting time for the computer reduces.

Important Verb Functions

dplyr package provides various important function that can be used for Data Manipulation. These are:

  • filter() Function: For choosing cases and using their values as a base for doing so.
    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # Create a data frame with missing data 
    d <- data.frame( name = c("Abhi", "Bhavesh", "Chaman", "Dimri"), 
                     age = c(7, 5, 9, 16), 
                     ht = c(46, NA, NA, 69),
                     school = c("yes", "yes", "no", "no") )
    d
      
    # Finding rows with NA value
    d %>% filter(is.na(ht))
      
    # Finding rows with no NA value
    d %>% filter(! is.na(ht))

    chevron_right

    
    

    Output:

    # A tibble: 4 x 4
      name      age    ht school
            
    1 Abhi        7    46 yes   
    2 Bhavesh     5    NA yes   
    3 Chaman      9    NA no    
    4 Dimri      16    69 no
    
    # A tibble: 2 x 4
      name      age    ht school
            
    1 Bhavesh     5    NA yes   
    2 Chaman      9    NA no
    
    # A tibble: 2 x 4
      name    age    ht school
          
    1 Abhi      7    46 yes   
    2 Dimri    16    69 no
    
    
  • arrange(): For reordering of the cases.
    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # Create a data frame with missing data 
    d <- data.frame( name = c("Abhi", "Bhavesh", "Chaman", "Dimri"), 
                     age = c(7, 5, 9, 16), 
                     ht = c(46, NA, NA, 69),
                     school = c("yes", "yes", "no", "no") )
      
    # Arranging name according to the age
    d.name<- arrange(d, age)
    print(d.name)

    chevron_right

    
    

    Output:



    
    # A tibble: 4 x 4
      name      age    ht school
            
    1 Bhavesh     5    NA yes   
    2 Abhi        7    46 yes   
    3 Chaman      9    NA no    
    4 Dimri      16    69 no   
    
  • select() and rename(): For choosing variables and using their names as a base for doing so.
    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # Create a data frame with missing data 
    d <- data.frame( name = c("Abhi", "Bhavesh", "Chaman", "Dimri"), 
                     age = c(7, 5, 9, 16), 
                     ht = c(46, NA, NA, 69),
                     school = c("yes", "yes", "no", "no") )
      
    # startswith() function to print only ht data
    select(d, starts_with("ht"))
      
    # -startswith() function to print everything except ht data
    select(d, -starts_with("ht"))
      
    # Printing column 1 to 2
    select(d, 1:2)
      
    # Printing data of column heading containing 'a'
    select(d, contains("a"))
      
    # Printing data of column heading which matches 'na'
    select(d, matches("na"))

    chevron_right

    
    

    Output:

    
    # A tibble: 4 x 1
         ht
      
    1    46
    2    NA
    3    NA
    4    69
    
    # A tibble: 4 x 3
      name      age school
           
    1 Abhi        7 yes   
    2 Bhavesh     5 yes   
    3 Chaman      9 no    
    4 Dimri      16 no
    
    # A tibble: 4 x 2
      name      age
         
    1 Abhi        7
    2 Bhavesh     5
    3 Chaman      9
    4 Dimri      16
    
    # A tibble: 4 x 2
      name      age
         
    1 Abhi        7
    2 Bhavesh     5
    3 Chaman      9
    4 Dimri      16
    
    # A tibble: 4 x 1
      name   
        
    1 Abhi   
    2 Bhavesh
    3 Chaman 
    4 Dimri
    
  • mutate() and transmute(): Addition of new variables which are the functions of prevailing variables.
    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # Create a data frame with missing data 
    d <- data.frame( name = c("Abhi", "Bhavesh", "Chaman", "Dimri"), 
                     age = c(7, 5, 9, 16), 
                     ht = c(46, NA, NA, 69),
                     school = c("yes", "yes", "no", "no") )
      
    # Calculating a variable x3 which is sum of height
    # and age printing with ht and age
    mutate(d, x3 = ht + age) 
      
    # Calculating a variable x3 which is sum of height 
    # and age printing without ht and age
    transmute(d, x3 = ht + age) 

    chevron_right

    
    

    Output:

     
    # A tibble: 4 x 5
      name      age    ht school    x3
             
    1 Abhi        7    46 yes       53
    2 Bhavesh     5    NA yes       NA
    3 Chaman      9    NA no        NA
    4 Dimri      16    69 no        85
    
    # A tibble: 4 x 1
         x3
      
    1    53
    2    NA
    3    NA
    4    85
    > 
    
  • summarise(): Condensing various values to one value.
    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # Create a data frame with missing data 
    d <- data.frame( name = c("Abhi", "Bhavesh", "Chaman", "Dimri"), 
                     age = c(7, 5, 9, 16), 
                     ht = c(46, NA, NA, 69),
                     school = c("yes", "yes", "no", "no") )
      
    # Calculating mean of age
    summarise(d, mean = mean(age))
      
    # Calculating min of age
    summarise(d, med = min(age))
      
    # Calculating max of age
    summarise(d, med = max(age))
      
    # Calculating median of age
    summarise(d, med = median(age))

    chevron_right

    
    

    Output:

    
    # A tibble: 1 x 1
          mean
         
    1     9.25
    
    # A tibble: 1 x 1
        med
      
    1     5
    
    # A tibble: 1 x 1
        med
      
    1    16
    
    # A tibble: 1 x 1
        med
      
    1     8
    
  • sample_n() and sample_frac(): For taking random specimens.
    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # Create a data frame with missing data 
    d <- data.frame( name = c("Abhi", "Bhavesh", "Chaman", "Dimri"), 
                     age = c(7, 5, 9, 16), 
                     ht = c(46, NA, NA, 69),
                     school = c("yes", "yes", "no", "no") )
      
    # Printing three rows
    sample_n(d, 3)
      
    # Printing 50 % of the rows
    sample_frac(d, 0.50)

    chevron_right

    
    

    Output:

    
    # A tibble: 3 x 4
      name      age    ht school
            
    1 Abhi        7    46 yes   
    2 Bhavesh     5    NA yes   
    3 Chaman      9    NA no 
    
    # A tibble: 2 x 4
      name      age    ht school
            
    1 Dimri      16    69 no    
    2 Bhavesh     5    NA yes  
    



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

2


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.