Pivot Longer in R
Last Updated :
24 Apr, 2024
Pivoting data to a longer format is a common task in data wrangling. The goal is to transform wide-format data, where columns represent different variables, into long-format data, where each row represents an observation, and a single column holds variable names while another contains their values.
The tidyverse package in R provides a convenient function, pivot_longer, to accomplish this. This guide explains how to use pivot_longer in various scenarios in the R Programming Language.
Why Pivot to a Longer Format?
Pivoting to a longer format is useful when:
- You want to create a “tidy” data frame, where each row is a unique observation and each variable has its column.
- You want to perform analysis or visualization with tools that expect long-format data.
- You need to prepare data for functions like ggplot2 or dplyr.
Using pivot_longer in R
To use pivot_longer, you need the tidyverse or tidyr packages. The basic syntax is:
library(tidyr)
data %>%
pivot_longer(
cols = <columns_to_pivot>,
names_to = "variable_name",
values_to = "variable_value"
)
- cols: Specifies which columns to pivot into the longer format.
- names_to: The name of the new column that will hold the variable names.
- values_to: The name of the new column that will hold the values.
Let’s look at different examples of using pivot_longer.
Pivoting a Simple Data Frame
Suppose you have a data frame with sales data for different months. You can pivot the data to create a “tidy” format.
R
library(tidyr)
# Create a wide-format data frame
df <- data.frame(
ID = c(1, 2, 3),
Jan = c(200, 250, 300),
Feb = c(220, 270, 320),
Mar = c(230, 280, 330)
)
df
# Pivot the data to longer format
df_long <- df %>%
pivot_longer(
cols = c(Jan, Feb, Mar),
names_to = "Month",
values_to = "Sales"
)
print(df_long)
Output:
ID Jan Feb Mar
1 1 200 220 230
2 2 250 270 280
3 3 300 320 330
# A tibble: 9 × 3
ID Month Sales
<dbl> <chr> <dbl>
1 1 Jan 200
2 1 Feb 220
3 1 Mar 230
4 2 Jan 250
5 2 Feb 270
6 2 Mar 280
7 3 Jan 300
8 3 Feb 320
9 3 Mar 330
This example creates a long-format data frame with ID, Month, and Sales.
Pivoting with a Pattern
If you have columns with similar prefixes, you can use tidyr::starts_with to pivot those columns:
R
# Create a data frame with multiple measurement columns
df <- data.frame(
ID = c(1, 2, 3),
Height = c(170, 175, 180),
Weight = c(65, 70, 75),
BMI = c(22.5, 23.0, 23.5)
)
df
# Pivot only columns starting with "Height" or "Weight"
df_long <- df %>%
pivot_longer(
cols = starts_with("Height"),
names_to = "Measurement",
values_to = "Value"
)
print(df_long)
Output:
ID Height Weight BMI
1 1 170 65 22.5
2 2 175 70 23.0
3 3 180 75 23.5
# A tibble: 3 × 5
ID Weight BMI Measurement Value
<dbl> <dbl> <dbl> <chr> <dbl>
1 1 65 22.5 Height 170
2 2 70 23 Height 175
3 3 75 23.5 Height 180
This example pivots only the columns with a specific pattern.
Handling Multiple Variables
When multiple variables are contained within one column name, you can use tidyr::separate to extract them. Here’s an example with a dataset where columns represent both measurement and time:
R
# Create a data frame with combined variables in column names
df <- data.frame(
ID = c(1, 2, 3),
Temp_1 = c(36.5, 36.8, 37.1),
Temp_2 = c(36.6, 36.9, 37.2),
HeartRate_1 = c(72, 75, 78),
HeartRate_2 = c(73, 76, 79)
)
df
# Pivot to longer format
df_long <- df %>%
pivot_longer(
cols = everything(),
names_to = c("Measurement", "Time"),
names_sep = "_",
values_to = "Value"
)
print(df_long)
Output:
ID Temp_1 Temp_2 HeartRate_1 HeartRate_2
1 1 36.5 36.6 72 73
2 2 36.8 36.9 75 76
3 3 37.1 37.2 78 79
# A tibble: 15 × 3
Measurement Time Value
<chr> <chr> <dbl>
1 ID NA 1
2 Temp 1 36.5
3 Temp 2 36.6
4 HeartRate 1 72
5 HeartRate 2 73
6 ID NA 2
7 Temp 1 36.8
8 Temp 2 36.9
9 HeartRate 1 75
10 HeartRate 2 76
11 ID NA 3
12 Temp 1 37.1
13 Temp 2 37.2
14 HeartRate 1 78
15 HeartRate 2 79
This example demonstrates how to pivot data and then separate multiple variables from the original column names.
Conclusion
Pivoting data to a longer format is a crucial data wrangling technique in R, especially when working with “tidy” data. Using pivot_longer, you can transform wide-format data into a long-format structure, facilitating analysis, visualization, and other operations. With the flexibility of tidyverse, you can specify which columns to pivot, rename variable columns, and extract multiple variable names from combined column titles.
Share your thoughts in the comments
Please Login to comment...