Top 10 R Libraries for Data Science in 2020

When talking about Data Science, it is impossible not to talk about R. In fact, it can be said that R is the best language for Data Science as it was developed by statisticians for statisticians! It is also very popular (despite getting stiff competition from Python!) with an active community and many cutting edge libraries currently available.

Top-10-R-Libraries-for-Data-Science-in-2020

In fact, there are many R libraries that contain a host of functions, tools, and methods to manage and analyze data. Each of these libraries has a particular focus with some libraries managing image and textual data, data manipulation, data visualization, web crawling, machine learning, and so on. Here we have the top 10 R libraries for Data Science so let’s check them out now!

1. dplyr

dplyr is a very popular data manipulation library in R. It has five important functions that are combined naturally with the group_by() function that can help in performing these functions in groups. These functions include the mutate() function which can add new variables that are functions of existing variables, select() function that selects the variables based on their names, filter() function that picks selects the variables based on their values, summarise() function that reduces multiple values into a summary, and the arrange() function that arranges the arranges the row orderings. If you want to install dplyr, the best method is to install the tidyverse which is a collection of R packages created specifically for Data Science. Or you can just install dplyr using install.packages(“dplyr”).

2. ggplot2

ggplot2 is an R data visualization library that is based on The Grammar of Graphics. ggplot2 can create data visualizations such as bar charts, pie charts, histograms, scatterplots, error charts, etc. using high-level API. It also allows you to add different types of data visualization components or layers in a single visualization. Once ggplot2 has been told which variables to map to which aesthetics in the plot, it does the rest of the work so that the user can focus on interpreting the visualizations and take less time in creating them. But this also means that it is not possible to create highly customized graphics in ggplot2. But there are a lot of resources in the RStudio community and Stack Overflow which can provide help in ggplot2 when needed. Just like dplyr, if you want to install ggplot2, you can install the tidyverse or you can just install ggplot2 using install.packages(“ggplot2”)



3. Esquisse

Esquisse is a data visualization tool in R that allows you to create detailed data visualizations using the ggplot2 package. You can create all sorts of scatter plots, histograms, line charts, bar charts, pie charts, error bars, box plots, multiple axes, sparklines, dendrograms, 3-D charts, etc. using Esquisse and also export these graphs or access the code for creating these graphs. Esquisse is such a famous and easily used data visualization tool because of its drag and drop ability that makes it popular even among beginners. You can install Esquisse from CRAN using install.packages(“esquisse”) or install the development version from GitHub using remotes::install_github(“dreamRs/esquisse”).

4. Shiny

Shiny is an R package that can be used to build interactive web applications in R. Basically, Shiny gives a combination between R and the modern web. And you can easily create web applications using Shiny without needing any special web development skills. Using Shiny, you can embed web applications in R documents, create standalone applications on a webpage, or even create web visualization dashboards. If you want to extend the functionality of your shiny applications, you can do so by adding HTML widgets, CSS themes, JavaScript actions, etc. If you are not that knowledgeable about Shiny, you can access Video tutorials on the Shiny RStudio website. And you can also deploy the Shiny app to the cloud or on your own servers with an open-source or commercial license.

5. mlr3

mlr3 is an R tool created specifically for Machine Learning. You can implement various Supervised and Unsupervised Machine learning models on Scikit-learn like Classification, Regression, Support Vector Machines, Random Forests, Nearest Neighbors, Naive Bayes, Decision Trees, Clustering, etc. with mlr3. It is also connected to the  OpenML R package which is dedicated to supporting machine learning online. You can easily create your own MAchine Learning algorithms on mlr3 or even work within the already established algorithms. mlr3 is an improvement over its previous version mlr which is now retired and not updated by the mlr-org team. You can install the last release of mlr3 from CRAN using install.packages(“mlr3”) or install the development version from GitHub using remotes::install_github(“mlr-org/mlr3”).

6. Lubridate

Lubridate is an R library that is particularly focused on making date-time easy to handle. Working with date-time data can be frustrating with R because R commands are unintuitive for this type of data and can change based on the type of date-time object. In this situation, Lubridate is a lifesaver as it allows easy management of date-time data with simple functions that manage components of date-time such as second(), minute(), hour(), day(), month(), and year(). There are many new time span classes in Lubridate as well that help in handling mathematical operations. these classes include Intervals that provide a protean summary of the time information between two points, Durations that provide the amount of time between two points, Periods that accurately track clock times, etc.  If you want to install dplyr, the best method is to install the tidyverse using install.packages(“tidyverse”) as Lubridate is a part of it. Or you can just install dplyr using install.packages(“lubridate”).

7. RCrawler

RCrawler is an R package for domain-based web crawling and web scraping which involves obtaining structured data from websites that can be used in multiple applications. RCrawler is involved in web structure mining, text mining, web content mining, etc. Rcrawler can automatically move across all the pages on a website and extract all data that is required from these web pages using a single command. However, this process of web crawling is performed by concurrent nodes that work in parallel so its best to have the 64bit version of R for Rcrawler. You can install the release version of Rcrawler from CRAN using the command install.packages(“Rcrawler”, dependencies = TRUE) or install the development version from GitHub which may have errors using the command devtools::install_github(“salimk/Rcrawler”)

8. knitr

knitr is an R package for dynamic report generation that can be used to integrate various types of code into the R code such as Markdown, LyX, LaTeX, AsciiDoc, HTML, etc. knitr is a very important package to have if you are working in research for report creation and it is also very supportive in automating the data process from data analysis to creating a report about it. knitr also combines many features into a single package and also solves some problems with Sweave, which is a function in R that integrates R code into LyX or LaTeX documents. You can install the stable version of knitr on CRAN using install.packages(‘knitr’) or install the development version from XRAN using install.packages(‘knitr’, repos = c(‘https://xran.yihui.org’, ‘https://cran.r-project.org’)).

9. DT

DT is an R package that provides an interface to the JavaScript library DataTables that can be used to display R matrices and data frames in the form of tables. These tables are interactive HTML tables and you can perform many different functions such as sorting, searching, filtering, etc. The most important function in DT is datatable() as it can create a data table to display the R objects. You can also style your tables in DT using the CSS classes. You can install the stable version of DT on CRAN using install.packages(‘DT’) or install the development version from GitHub using remotes::install_github(‘rstudio/DT’) .

10. Plotly

Plotly is a free open-source graphing library that can be used to form data visualizations. Plotly is an R package that is built on top of the Plotly JavaScript library (plotly.js) and can be used to create web-based data visualizations that can be displayed in Jupyter notebooks or web applications using Dash or saved as individual HTML files. Plotly provides more than 40 unique chart types like scatter plots, histograms, line charts, bar charts, pie charts, error bars, box plots, multiple axes, sparklines, dendrograms, 3-D charts, etc. Plotly also provides contour plots, which are not that common in other data visualization libraries. In addition to all this, Plotly can be used offline with no internet connection. You can install Plotly from CRAN using install.packages(‘plotly’) or install the latest development version from GitHub using devtools::install_github(“ropensci/plotly”).




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

1


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.