What Are the Tidyverse Packages in R Language?

When dealing with Data Science in R, the Tidyverse packages are your best friends! These Tidyverse packages were specially designed for Data Science with a common design philosophy. They include all the packages required in the data science workflow, ranging from data exploration to data visualization. For example, readr is for data importing, tibble and tidyr help in tidying the data, dplyr and stringr contribute to data transformation and ggplot2 is vital for data visualization.

What-Are-the-Tidyverse-Packages-in-R-Language

There are eight core Tidyverse packages namely ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, and forcats that are mentioned in this article. All of these packages are loaded automatically at once with the install.packages(“tidyverse”) command. In addition to these packages, Tidyverse also has some specialized packages that are not loaded automatically but need their own call. These include the DBI for relational databases. httr for web APIs, rvest for web scraping, etc. Now, let’s see the core Tidyverse packages and learn more about them!

1. ggplot2

ggplot2 is an R data visualization library that is based on The Grammar of Graphics. ggplot2 can create data visualizations such as bar charts, pie charts, histograms, scatterplots, error charts, etc. using high-level API. It also allows you to add different types of data visualization components or layers in a single visualization. Once ggplot2 has been told which variables to map to which aesthetics in the plot, it does the rest of the work so that the user can focus on interpreting the visualizations and take less time in creating them. But this also means that it is not possible to create highly customized graphics in ggplot2. But there are a lot of resources in the RStudio community and Stack Overflow which can provide help in ggplot2 when needed. If you want to install ggplot2, the best method is to install the tidyverse using install.packages(“tidyverse”). Or you can just install ggplot2 using install.packages(“ggplot2”). You can also install the development version from GitHub using devtools::install_github(“tidyverse/ggplot2”)

2. dplyr

dplyr is a very popular data manipulation library in R. It has five important functions that are combined naturally with the group_by() function that can help in performing these functions in groups. These functions include the mutate() function which can add new variables that are functions of existing variables, select() function that selects the variables based on their names, filter() function that picks selects the variables based on their values, summarise() function that reduces multiple values into a summary, and the arrange() function that arranges the arranges the row orderings.  If you want to install dplyr, the best method is to install the tidyverse using install.packages(“tidyverse”). Or you can just install dplyr using install.packages(“dplyr”). You can also install the development version from GitHub using devtools::install_github(“tidyverse/dplyr”)



3. tidyr

tidyr is a data cleaning library in R which helps to create tidy data. Tidy data means that all the data cells have a single value with each of the data columns being a variable and the data rows being an observation. This tidy data is a staple in the tidyverse and it ensures that more time is spent on data analysis and to obtain value from data rather than cleaning the data continuously and modifying the tools to handle untidy data. The functions in tidyr broadly fall into five categories namely, Pivoting which changes the data between long and wide forms, Nesting which changes grouped data so that a group is a single row with a nested data frame, Splitting character columns and then combining them, Rectangling which converts nested lists into tidy tibbles and converting implicit missing values into explicit values. If you want to install tidyr, the best method is to install the tidyverse using install.packages(“tidyverse”). Or you can just install tidyr using install.packages(“tidyr”). You can also install the development version from GitHub using devtools::install_github(“tidyverse/tidyr”)

4. readr

readr is a library that provides a simple and speedy method to read rectangular data such as that with file formats tsv, csv, delim, fwf, etc. readr can parse many different types of data using a function that parses the total file and another that focuses on the specific column. This column specification defines the method to convert the data in the colum from a character vector to the data type that is most suited. This is done automatically by readr in most cases. readr can read different kinds of file formats using different functions, namely read_csv() for comma-separated files, read_tsv() for tab-separated files, read_table() for tabular files, read_fwf() for fixed-width files, read_delim() for delimited files, and, read_log() for web log files. If you want to install readr, the best method is to install the tidyverse using install.packages(“tidyverse”). Or you can just install readr using install.packages(“readr”). You can also install the development version from GitHub using devtools::install_github(“tidyverse/readr”)

5. purrr

purrr is a detailed set of tools for functions and vectors and it is mainly used to manage the functional programming in R. A good example of this is the map() functions that are used to replace multiple for loops that complicate and mess up the code inro simpler code that is easy to read. In addition to that, all purrr functions are type-stable which means they either return the advertised output type and if that is not possible, then the give an error. If you want to install purrr, the best method is to install the tidyverse using install.packages(“tidyverse”). Or you can just install purrr using install.packages(“purrr”). You can also install the development version from GitHub using devtools::install_github(“tidyverse/purrr”)

6. tibble

A tibble is a form of a data.frame which includes the useful parts of it and discards the parts that are not so important. So tibbles don’t change variables names or types like data.frames nor do they do partial matching but they bring problems to the forefront much sooner such as when a variable does not exist. So a code with tibbles is much cleaner and effective than before. Tibbles are also easier to use with larger datasets that contain more complex objects, in part before of an enhanced print() method. You can create new tibbles from column vectors using the tibble() function and you can also create a tibble row-by-row using a tribble() function. If you want to install tibble, the best method is to install the tidyverse using install.packages(“tidyverse”). Or you can just install tibble using install.packages(“tibble”). You can also install the development version from GitHub using devtools::install_github(“tidyverse/tibble”)

7. stringr

stringr is a library that has many functions used for data cleaning and data preparation tasks. It is also designed for working with strings and has many functions that make this an easy process. stringr is built on top of stringi, which is an International Components for Unicode C library. So if there are any functions that you want to use but cannot find in stringr, then the best place to look for them is stringi. This also means that once you master stringr, stringi is not that difficult to use as both of these packages have similar conventions. All of the functions in stringr start with str and they take a string vector as their first argument. Some of these functions include str_detect(), str_extract(), str_match(), str_count(), str_replace(), str_subset(), etc.  If you want to install stringr, the best method is to install the tidyverse using install.packages(“tidyverse”). Or you can just install stringr from CRAN using install.packages(“stringr”). You can also install the development version from GitHub using devtools::install_github(“tidyverse/stringr”)

8. forcats

forcats is a R library that is concerned with handling problems associated with vectors. These vectors are variables that have a fixed set of possible values they can take which is already known in advance. So forecats deals with issues like changes the orders of values in vectors, reordering the vectors, etc. Some of the functions in forcats are fct_relevel() that reorders a vectors by hand, fct_reorder() that reorders a factor using another variable, fct_infreq() that reorders a factorby frequency values etc. If you want to install forcats, the best method is to install the tidyverse using install.packages(“tidyverse”). Or you can just install forcats from using install.packages(“forcats”). You can also install the development version from GitHub using devtools::install_github(“tidyverse/forcats”).




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

2


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.