Open In App

Pivot Longer in R

Last Updated : 24 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Pivoting data to a longer format is a common task in data wrangling. The goal is to transform wide-format data, where columns represent different variables, into long-format data, where each row represents an observation, and a single column holds variable names while another contains their values.

The tidyverse package in R provides a convenient function, pivot_longer, to accomplish this. This guide explains how to use pivot_longer in various scenarios in the R Programming Language.

Why Pivot to a Longer Format?

Pivoting to a longer format is useful when:

  1. You want to create a “tidy” data frame, where each row is a unique observation and each variable has its column.
  2. You want to perform analysis or visualization with tools that expect long-format data.
  3. You need to prepare data for functions like ggplot2 or dplyr.

Using pivot_longer in R

To use pivot_longer, you need the tidyverse or tidyr packages. The basic syntax is:

library(tidyr)
data %>%
  pivot_longer(
    cols = <columns_to_pivot>,
    names_to = "variable_name",
    values_to = "variable_value"
  )
  • cols: Specifies which columns to pivot into the longer format.
  • names_to: The name of the new column that will hold the variable names.
  • values_to: The name of the new column that will hold the values.

Let’s look at different examples of using pivot_longer.

Pivoting a Simple Data Frame

Suppose you have a data frame with sales data for different months. You can pivot the data to create a “tidy” format.

R
library(tidyr)
# Create a wide-format data frame
df <- data.frame(
  ID = c(1, 2, 3),
  Jan = c(200, 250, 300),
  Feb = c(220, 270, 320),
  Mar = c(230, 280, 330)
)
df
# Pivot the data to longer format
df_long <- df %>%
  pivot_longer(
    cols = c(Jan, Feb, Mar),
    names_to = "Month",
    values_to = "Sales"
  )
print(df_long)

Output:

  ID Jan Feb Mar
1  1 200 220 230
2  2 250 270 280
3  3 300 320 330

# A tibble: 9 × 3
     ID Month Sales
  <dbl> <chr> <dbl>
1     1 Jan     200
2     1 Feb     220
3     1 Mar     230
4     2 Jan     250
5     2 Feb     270
6     2 Mar     280
7     3 Jan     300
8     3 Feb     320
9     3 Mar     330

This example creates a long-format data frame with ID, Month, and Sales.

Pivoting with a Pattern

If you have columns with similar prefixes, you can use tidyr::starts_with to pivot those columns:

R
# Create a data frame with multiple measurement columns
df <- data.frame(
  ID = c(1, 2, 3),
  Height = c(170, 175, 180),
  Weight = c(65, 70, 75),
  BMI = c(22.5, 23.0, 23.5)
)
df
# Pivot only columns starting with "Height" or "Weight"
df_long <- df %>%
  pivot_longer(
    cols = starts_with("Height"),
    names_to = "Measurement",
    values_to = "Value"
  )
print(df_long)

Output:

  ID Height Weight  BMI
1  1    170     65 22.5
2  2    175     70 23.0
3  3    180     75 23.5

# A tibble: 3 × 5
     ID Weight   BMI Measurement Value
  <dbl>  <dbl> <dbl> <chr>       <dbl>
1     1     65  22.5 Height        170
2     2     70  23   Height        175
3     3     75  23.5 Height        180

This example pivots only the columns with a specific pattern.

Handling Multiple Variables

When multiple variables are contained within one column name, you can use tidyr::separate to extract them. Here’s an example with a dataset where columns represent both measurement and time:

R
# Create a data frame with combined variables in column names
df <- data.frame(
  ID = c(1, 2, 3),
  Temp_1 = c(36.5, 36.8, 37.1),
  Temp_2 = c(36.6, 36.9, 37.2),
  HeartRate_1 = c(72, 75, 78),
  HeartRate_2 = c(73, 76, 79)
)
df
# Pivot to longer format
df_long <- df %>%
  pivot_longer(
    cols = everything(),
    names_to = c("Measurement", "Time"),
    names_sep = "_",
    values_to = "Value"
  )
print(df_long)

Output:

  ID Temp_1 Temp_2 HeartRate_1 HeartRate_2
1  1   36.5   36.6          72          73
2  2   36.8   36.9          75          76
3  3   37.1   37.2          78          79

# A tibble: 15 × 3
   Measurement Time  Value
   <chr>       <chr> <dbl>
 1 ID          NA      1  
 2 Temp        1      36.5
 3 Temp        2      36.6
 4 HeartRate   1      72  
 5 HeartRate   2      73  
 6 ID          NA      2  
 7 Temp        1      36.8
 8 Temp        2      36.9
 9 HeartRate   1      75  
10 HeartRate   2      76  
11 ID          NA      3  
12 Temp        1      37.1
13 Temp        2      37.2
14 HeartRate   1      78  
15 HeartRate   2      79

This example demonstrates how to pivot data and then separate multiple variables from the original column names.

Conclusion

Pivoting data to a longer format is a crucial data wrangling technique in R, especially when working with “tidy” data. Using pivot_longer, you can transform wide-format data into a long-format structure, facilitating analysis, visualization, and other operations. With the flexibility of tidyverse, you can specify which columns to pivot, rename variable columns, and extract multiple variable names from combined column titles.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads