Data Wrangling in R Programming – Working with Tibbles
R is a robust language used by Analysts, Data Scientists, and Business users to perform various tasks such as statistical analysis, visualizations, and developing statistical software in multiple fields.
Data Wrangling is a process reimaging the raw data to a more structured format, which will help to get better insights and make better decisions from the data.
What are Tibbles?
Tibbles are the core data structure of the
tidyverse and is used to facilitate the display and analysis of information in a tidy format. Tibbles is a new form of data frame where data frames are the most common data structures used to store data sets in R.
Advantages of Tibbles over data frames
- All Tidyverse packages support Tibbles.
- Tibbles print in a much cleaner format than data frames.
- A data frame often converts character strings to factor and analysts often have to override the setting while Tibbles doesn’t try to make this conversion automatically.
Different ways to create Tibbles
The first function is as tibble function. This function is used to create a tibble from an existing data frame.
as_tibble(x, validate = NULL, …)
x is either a data frame, matrix, or list.
The second way is to use a
tibble()function, which is used to create a tibble from scratch.
tibble(s…, rows = NULL)
s represents a set of name-value pairs.
Finally, you can use the tidyverse’s data import packages to create Tibbles from external data sources such as databases or CSV files.
Syntax: import(pkgname …)
library()function is used to load the namespace of the package.
library(package, help, pos = 2, lib.loc = NULL)
Note: To find more about the functions in R, type ? followed by function name. Eg: ?tibble.
Let us see some examples of how to use the above functions using Rstudio IDE. We will be using a builtin dataset (CO2) Carbon Dioxide Uptake in Grass Plants to create a tibble.
This dataset consists of several variables, such as plant, type, treatment, concentration, and uptake. It is difficult to work with this type of information, so let us convert this information into a tibble. Let us create a tibble named sample_tibble from CO2 dataset using as_tibble() function.
Example of as_tibble()
Here we are converting a data frame (CO2) into tibble using
as_tibble() function. It requires you to install tidyverse package in Rstudio.
Example of tibble()
The second Method was to create a tibble from scratch using
tibble() function so we will create few vectors such as name, marks_in_Math, marks_in_Java, Fav_color etc and pass them to
tibble() function which converts them into tibble.
Data analysts often extract a single variable from a tibble for further use in their analysis, which is called subsetting. When we try to subset a tibble, we extract a single variable from the Tibble in vector form. We can do this by using a few special operators.
- $ Operator
- [] Operator
The first way we can extract a variable from Tibble is by using a dollar($) sign, operator. To do this, we will be creating a tibble from scratch using a
The second way you can access a single variable from Tibble is by using square braces([]). We will use the same tibble created previously.
Filtering provides a way to help reduce the number of rows in your tibble. When performing filtering, we can specify conditions or specific criteria that are used to reduce the number of rows in the dataset.
Syntax: filter(data, conditions)
The data represents the Tibble name, and conditions are used to specify an expression that returns a logical value. We will be using the student’s Tibble, which we created in the above example.