The dplyr package in R Programming Language is a structure of data manipulation that provides a uniform set of verbs, helping to resolve the most frequent data manipulation hurdles.
The dplyr Package in R performs the steps given below quicker and in an easier fashion:
- By limiting the choices the focus can now be more on data manipulation difficulties.
- There are uncomplicated “verbs”, functions present for tackling every common data manipulation and the thoughts can be translated into code faster.
- There are valuable backends and hence waiting time for the computer reduces.
Important Verb Functions
dplyr package provides various important functions that can be used for Data Manipulation. These are:
- filter() Function: For choosing cases and using their values as a base for doing so.
R
d < - data.frame (name= c ( "Abhi" , "Bhavesh" ,
"Chaman" , "Dimri" ),
age= c (7, 5, 9, 16),
ht= c (46, NA , NA , 69),
school= c ( "yes" , "yes" , "no" , "no" ))
d
d % > % filter ( is.na (ht))
d % > % filter (! is.na (ht))
|
Output:
# A tibble: 4 x 4
name age ht school
1 Abhi 7 46 yes
2 Bhavesh 5 NA yes
3 Chaman 9 NA no
4 Dimri 16 69 no
# A tibble: 2 x 4
name age ht school
1 Bhavesh 5 NA yes
2 Chaman 9 NA no
# A tibble: 2 x 4
name age ht school
1 Abhi 7 46 yes
2 Dimri 16 69 no
- arrange(): For reordering of the cases.
R
d <- data.frame ( name = c ( "Abhi" , "Bhavesh" , "Chaman" , "Dimri" ),
age = c (7, 5, 9, 16),
ht = c (46, NA , NA , 69),
school = c ( "yes" , "yes" , "no" , "no" ) )
d.name<- arrange (d, age)
print (d.name)
|
Output:
# A tibble: 4 x 4
name age ht school
1 Bhavesh 5 NA yes
2 Abhi 7 46 yes
3 Chaman 9 NA no
4 Dimri 16 69 no
- select() and rename(): For choosing variables and using their names as a base for doing so.
R
d < - data.frame (name= c ( "Abhi" , "Bhavesh" ,
"Chaman" , "Dimri" ),
age= c (7, 5, 9, 16),
ht= c (46, NA , NA , 69),
school= c ( "yes" , "yes" , "no" , "no" ))
select (d, starts_with ( "ht" ))
select (d, - starts_with ( "ht" ))
select (d, 1: 2)
select (d, contains ( "a" ))
select (d, matches ( "na" ))
|
Output:
# A tibble: 4 x 1
ht
1 46
2 NA
3 NA
4 69
# A tibble: 4 x 3
name age school
1 Abhi 7 yes
2 Bhavesh 5 yes
3 Chaman 9 no
4 Dimri 16 no
# A tibble: 4 x 2
name age
1 Abhi 7
2 Bhavesh 5
3 Chaman 9
4 Dimri 16
# A tibble: 4 x 2
name age
1 Abhi 7
2 Bhavesh 5
3 Chaman 9
4 Dimri 16
# A tibble: 4 x 1
name
1 Abhi
2 Bhavesh
3 Chaman
4 Dimri
- mutate() and transmute(): Addition of new variables which are the functions of prevailing variables.
R
d <- data.frame ( name = c ( "Abhi" , "Bhavesh" ,
"Chaman" , "Dimri" ),
age = c (7, 5, 9, 16),
ht = c (46, NA , NA , 69),
school = c ( "yes" , "yes" , "no" , "no" ) )
mutate (d, x3 = ht + age)
transmute (d, x3 = ht + age)
|
Output:
# A tibble: 4 x 5
name age ht school x3
1 Abhi 7 46 yes 53
2 Bhavesh 5 NA yes NA
3 Chaman 9 NA no NA
4 Dimri 16 69 no 85
# A tibble: 4 x 1
x3
1 53
2 NA
3 NA
4 85
>
- summarise(): Condensing various values to one value.
R
d <- data.frame ( name = c ( "Abhi" , "Bhavesh" ,
"Chaman" , "Dimri" ),
age = c (7, 5, 9, 16),
ht = c (46, NA , NA , 69),
school = c ( "yes" , "yes" , "no" , "no" ) )
summarise (d, mean = mean (age))
summarise (d, med = min (age))
summarise (d, med = max (age))
summarise (d, med = median (age))
|
Output:
# A tibble: 1 x 1
mean
1 9.25
# A tibble: 1 x 1
med
1 5
# A tibble: 1 x 1
med
1 16
# A tibble: 1 x 1
med
1 8
- sample_n() and sample_frac(): For taking random specimens.
R
d <- data.frame ( name = c ( "Abhi" , "Bhavesh" ,
"Chaman" , "Dimri" ),
age = c (7, 5, 9, 16),
ht = c (46, NA , NA , 69),
school = c ( "yes" , "yes" , "no" , "no" ) )
sample_n (d, 3)
sample_frac (d, 0.50)
|
Output:
# A tibble: 3 x 4
name age ht school
1 Abhi 7 46 yes
2 Bhavesh 5 NA yes
3 Chaman 9 NA no
# A tibble: 2 x 4
name age ht school
1 Dimri 16 69 no
2 Bhavesh 5 NA yes