dplyr Package in R Programming
In this article, we will discuss Aggregating and analyzing data with dplyr package in the R Programming Language.
dplyr Package in R
The dplyr package in R Programming Language is a structure of data manipulation that provides a uniform set of verbs, helping to resolve the most frequent data manipulation hurdles.
- By limiting the choices the focus can now be more on data manipulation difficulties.
- There are uncomplicated “verbs”, functions present for tackling every common data manipulation and the thoughts can be translated into code faster.
- There are valuable backends and hence waiting time for the computer is reduced.
Here are some key functions and concepts within the dplyr
package in R.
Data Frame and Tibble
Data frames in dplyr in R is organized tables where each column stores specific types of information, like names, ages, or scores.for creating a data frame involves specifying column names and their respective values.
R
df <- data.frame (
Name = c ( "vipul" , "jayesh" , "anurag" ),
Age = c (25, 23, 22),
Score = c (95, 89, 78)
)
df
|
Output:
Name Age Score
1 vipul 25 95
2 jayesh 23 89
3 anurag 22 78
On the other hand, tibbles, introduced through the tibble
package, share similar functionality but offer enhanced user-friendly features. The syntax for creating a tibble is comparable to that of a data frame.
Pipes (%>%
)
dplyr in R The pipe operator (%>%
) in dplyr package, which allows us to chain multiple operations together, improving code readability.
R
library (dplyr)
result <- mtcars %>%
filter (mpg > 20) %>%
select (mpg, cyl, hp) %>%
group_by (cyl) %>%
summarise (mean_hp = mean (hp))
print (result)
|
Output:
cyl mean_hp
<dbl> <dbl>
1 4 82.6
2 6 110
Verb Functions
dplyr in R provides various important functions that can be used for Data Manipulation. These are:
filter() Function
For choosing cases and using their values as a base for doing so.
R
d <- data.frame (name = c ( "Abhi" , "Bhavesh" , "Chaman" , "Dimri" ),
age = c (7, 5, 9, 16),
ht = c (46, NA , NA , 69),
school = c ( "yes" , "yes" , "no" , "no" ))
print (d)
rows_with_na <- d %>% filter ( is.na (ht))
print (rows_with_na)
rows_without_na <- d %>% filter (! is.na (ht))
print (rows_without_na)
|
Output:
name age ht school
1 Abhi 7 46 yes
2 Bhavesh 5 NA yes
3 Chaman 9 NA no
4 Dimri 16 69 no
Finding rows with NA value
name age ht school
1 Bhavesh 5 NA yes
2 Chaman 9 NA no
Finding rows with no NA value
name age ht school
1 Abhi 7 46 yes
2 Dimri 16 69 no
arrange():
For reordering of the cases.
R
d <- data.frame ( name = c ( "Abhi" , "Bhavesh" , "Chaman" , "Dimri" ),
age = c (7, 5, 9, 16),
ht = c (46, NA , NA , 69),
school = c ( "yes" , "yes" , "no" , "no" ) )
d
d.name<- arrange (d, age)
print (d.name)
|
Output:
name age ht school
1 Abhi 7 46 yes
2 Bhavesh 5 NA yes
3 Chaman 9 NA no
4 Dimri 16 69 no
Arranging name according to the age
name age ht school
1 Bhavesh 5 NA yes
2 Abhi 7 46 yes
3 Chaman 9 NA no
4 Dimri 16 69 no
select() and rename():
For choosing variables and using their names as a base for doing so.
R
d <- data.frame (name= c ( "Abhi" , "Bhavesh" ,
"Chaman" , "Dimri" ),
age= c (7, 5, 9, 16),
ht= c (46, NA , NA , 69),
school= c ( "yes" , "yes" , "no" , "no" ))
select (d, starts_with ( "ht" ))
select (d, - starts_with ( "ht" ))
select (d, 1: 2)
select (d, contains ( "a" ))
select (d, matches ( "na" ))
|
Output:
ht
1 46
2 NA
3 NA
4 69
everything except ht data
name age school
1 Abhi 7 yes
2 Bhavesh 5 yes
3 Chaman 9 no
4 Dimri 16 no
Printing column 1 to 2
name age
1 Abhi 7
2 Bhavesh 5
3 Chaman 9
4 Dimri 16
heading containing 'a'
name age
1 Abhi 7
2 Bhavesh 5
3 Chaman 9
4 Dimri 16
heading which matches 'na'
name
1 Abhi
2 Bhavesh
3 Chaman
4 Dimri
mutate() and transmute():
Addition of new variables which are the functions of prevailing variables.
R
d <- data.frame ( name = c ( "Abhi" , "Bhavesh" ,
"Chaman" , "Dimri" ),
age = c (7, 5, 9, 16),
ht = c (46, NA , NA , 69),
school = c ( "yes" , "yes" , "no" , "no" ) )
mutate (d, x3 = ht + age)
transmute (d, x3 = ht + age)
|
Output:
name age ht school
1 Abhi 7 46 yes
2 Bhavesh 5 NA yes
3 Chaman 9 NA no
4 Dimri 16 69 no
Calculating a variable x3 which is sum of height
name age ht school x3
1 Abhi 7 46 yes 53
2 Bhavesh 5 NA yes NA
3 Chaman 9 NA no NA
4 Dimri 16 69 no 85
Calculating a variable x3 which is sum of height
x3
1 53
2 NA
3 NA
4 85
summarise():
Condensing various values to one value.
R
d <- data.frame ( name = c ( "Abhi" , "Bhavesh" ,
"Chaman" , "Dimri" ),
age = c (7, 5, 9, 16),
ht = c (46, NA , NA , 69),
school = c ( "yes" , "yes" , "no" , "no" ) )
summarise (d, mean = mean (age))
summarise (d, med = min (age))
summarise (d, med = max (age))
summarise (d, med = median (age))
|
Output:
Calculating mean of age
mean
1 9.25
Calculating minimum age
med
1 5
Calculating max of age
med
1 16
Calculating median of age
med
1 8
sample_n() and sample_frac():
For taking random specimens.
R
d <- data.frame ( name = c ( "Abhi" , "Bhavesh" ,
"Chaman" , "Dimri" ),
age = c (7, 5, 9, 16),
ht = c (46, NA , NA , 69),
school = c ( "yes" , "yes" , "no" , "no" ) )
sample_n (d, 3)
sample_frac (d, 0.50)
|
Output:
name age ht school
1 Chaman 9 NA no
2 Dimri 16 69 no
3 Abhi 7 46 yes
Printing 50 % of the rows
name age ht school
1 Abhi 7 46 yes
2 Dimri 16 69 no
Last Updated :
20 Dec, 2023
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...