Open In App

Reshaping data.frame from wide to long format in R

Reshaping a data frame from wide to long format in R Programming Language is a common operation when dealing with data analysis and visualization. The process involves converting data that is spread across multiple columns (wide format) into a format where each row represents a single observation (long format). This article covers various methods to accomplish this task using functions like reshape2, tidyr, and pivot_longer.

Before diving into reshaping methods, let's create a sample data frame in a wide format.

# Create a sample data frame in wide format
wide_df <- data.frame(
  ID = c(1, 2, 3),
  Name = c("Ali", "Boby", "Charles"),
  Test1 = c(85, 90, 92),
  Test2 = c(88, 89, 95),
  Test3 = c(82, 87, 91)
)
print(wide_df)

Output:

  ID    Name Test1 Test2 Test3
1 1 Ali 85 88 82
2 2 Boby 90 89 87
3 3 Charles 92 95 91

This data frame contains student IDs, names, and scores for three tests (Test1, Test2, and Test3).

Reshaping with reshape2 Package

The melt function from the reshape2 package can be used to reshape data from wide to long format.

Using melt from reshape2

# Load reshape2 package
library(reshape2)
# Reshape data from wide to long format
long_df <- melt(wide_df, id.vars = c("ID", "Name"), variable.name = "Test", 
                value.name = "Score")
print(long_df)

Output:

  ID    Name  Test Score
1 1 Ali Test1 85
2 2 Boby Test1 90
3 3 Charles Test1 92
4 1 Ali Test2 88
5 2 Boby Test2 89
6 3 Charles Test2 95
7 1 Ali Test3 82
8 2 Boby Test3 87
9 3 Charles Test3 91

Reshaping with tidyr Package

The pivot_longer function from the tidyr package provides a modern and efficient way to reshape data from wide to long format.

Using pivot_longer from tidyr

# Create a sample data frame in wide format
wide_df <- data.frame(
  ID = c(1, 2, 3),
  Name = c("Ali", "Boby", "Charles"),
  Test1 = c(85, 90, 92),
  Test2 = c(88, 89, 95),
  Test3 = c(82, 87, 91)
)
print(wide_df)

# Load tidyr package
library(tidyr)
# Reshape data from wide to long format
long_df <- pivot_longer(wide_df, cols = starts_with("Test"), names_to = "Test", 
                        values_to = "Score")
print(long_df)

Output:

  ID    Name Test1 Test2 Test3
1 1 Ali 85 88 82
2 2 Boby 90 89 87
3 3 Charles 92 95 91

# A tibble: 9 × 4
ID Name Test Score
<dbl> <chr> <chr> <dbl>
1 1 Ali Test1 85
2 1 Ali Test2 88
3 1 Ali Test3 82
4 2 Boby Test1 90
5 2 Boby Test2 89
6 2 Boby Test3 87
7 3 Charles Test1 92
8 3 Charles Test2 95
9 3 Charles Test3 91

Using reshape from Base R

# Create a sample data frame in wide format
wide_df <- data.frame(
  ID = c(1, 2, 3),
  Name = c("Ali", "Boby", "Charles"),
  Test1 = c(85, 90, 92),
  Test2 = c(88, 89, 95),
  Test3 = c(82, 87, 91)
)
print(wide_df)
# Reshape data from wide to long format using base R
long_df <- reshape(wide_df, direction = "long", idvar = c("ID", "Name"), 
                   varying = list(names(wide_df)[3:5]), v.names = "Score", 
                   timevar = "Test", times = c("Test1", "Test2", "Test3"))
print(long_df)

Output:

  ID    Name Test1 Test2 Test3
1 1 Ali 85 88 82
2 2 Boby 90 89 87
3 3 Charles 92 95 91

ID Name Test Score
1.Ali.Test1 1 Ali Test1 85
2.Boby.Test1 2 Boby Test1 90
3.Charles.Test1 3 Charles Test1 92
1.Ali.Test2 1 Ali Test2 88
2.Boby.Test2 2 Boby Test2 89
3.Charles.Test2 3 Charles Test2 95
1.Ali.Test3 1 Ali Test3 82
2.Boby.Test3 2 Boby Test3 87
3.Charles.Test3 3 Charles Test3 91

Conclusion

Reshaping a data frame from wide to long format in R is a crucial step in data preprocessing and analysis. This article demonstrated three methods to accomplish this task: using the melt function from the reshape2 package, the pivot_longer function from the tidyr package, and the reshape function from base R. Depending on your preference and specific requirements, you can choose the most suitable method for reshaping your data.

Article Tags :