Data Wrangling in R Programming – Data Transformation

A dataset can be presented in many different ways to the world. Let us look at one of the most essential and fundamental distinctions, whether a dataset is wide or long.

The difference between wide and long datasets condenses to whether we prefer to have more rows in our dataset or more columns. A dataset that spotlights on putting additional data about a single column is known as a wide dataset because as we add more and more columns, the data set becomes wider. Similarly, a dataset that focuses on including data about a subject in rows is called a long dataset.

In Data Wrangling in R, sometimes, we need to make long datasets wider and vice-versa. In general, data scientists who embrace the concept of tidy data usually prefer long datasets over wide ones, because longer data sets are more comfortable to manipulate in R.

wide vs long

In the above figure, the same dataset is represented as a wide dataset as well as a long dataset. It is a dataset with religions with the income classification. As you got to know what are long and wide datasets, let us try to use tools in R to convert wide to long datasets and long to wide.



Conversion of Wide dataset to long

The gather() function in the 'tidyr' package makes wide datasets long. The gather function works on the concept of keys and values. The data values represent an observation of a single variable while the key is a name used to identify the variable described by the value.

key and value

In the dataset above, income acts as the key by classifying the income of different religions, and frequency provides the values to the income key.

Syntax:
gather(data, key, value, columns)

Parameters:
data: The Tibble name
key: The name that we would like to use for the key column in the long dataset.
value: The name we would like to apply for the value column in the long dataset.
columns: list of columns from the wide dataset that we would like to include or exclude from the gathering.

Alternatively, if you want to gather most of the columns, you can specify the columns that you don’t want to collect by listing them with a minus sign (-) in front of them.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Making Wide Datasets Long with gather()
  
# Load the tidyverse
library(tidyverse)   
  
# Read in the dataset
sample_data <- read.csv("C:/Users/Admin/Desktop/pew.csv"
  
sample_data
  
sample_data_long <- gather(sample_data, income, freq, -religion)
  
sample_data_long

chevron_right


Output:

gather1



gather2

Conversion of Long datasets to wide

There is sometimes a need to perform a reverse operation of gather() function. So the spread() function is used to convert long datasets to wider datasets.

Syntax:
spread(data, key, value)

Parameters:
data: The Tibble name
key: The name that we would like to use for the key column in the long dataset.
value: The name we would like to apply for the value column in the long dataset.

filter_none

edit
close

play_arrow

link
brightness_4
code

library(tidyverse)   
sample_data <- read.csv("C:/Users/Admin/Desktop/mexicanweather.csv")
sample_data
  
sample_data_wide <- spread(sample_data, element, value)
  
sample_data_wide

chevron_right


Output:

spread1

spread2

Now we can see the Tibble that is half the size of the Mexican dataset. We have Tmax and Tmin columns and no longer have the element or value columns.




My Personal Notes arrow_drop_up


If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

1


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.