Open In App

How to Split Vector and DataFrame in R

Last Updated : 26 Feb, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

R is a programming language and environment specifically designed for facts analysis, statistical computing, and graphics. Sometimes it is required to split data into batches for various data manipulation and analysis tasks. In this article, we will discuss some techniques to split vectors into chunks using the R Programming Language.

Concepts related to the topic

In R language, a vector is a fundamental data structure that stores sequences of elements. Vector is the same as a one-dimensional array in other languages that can hold elements of the same data type.

a chunk is a portion or segment of data that is processed as a unit, often used to improve efficiency, manage memory usage, or handle data streams.

How to split Vector into chunks

Below are the methods that we will cover in this article:

  • Using split()
  • Using cut()
  • Using a Loop

Using split()

The split() is a built-in function in R which is used to split vector, data frame or list into subsets based on the the factor provided.

Syntax 
split(x, f)
parameters:
x: object to be split.
f: factor or grouping variable indicating how to split x.

Split vector of numeric data

here we created vector of numbers from 1-16 and provided chunk size as 4.

R




# Create a sample vector
my_vect <- 1:16
  
# printing vector before split
print('Vecor before split :')
print(my_vect)
  
# Define the number of elements in each chunk
chunk_size <- 4
  
# Split the vector into chunks and store in chunks variable
chunks <- split(my_vect, ceiling(seq_along(my_vect) / chunk_size))
  
# add all chunks to list
chunks_list <- list(chunks=chunks)
  
# Print the chunks list
print(chunks_list)


Output:

[1] "Vecor before split :"

 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16

$chunks
$chunks$`1`
[1] 1 2 3 4

$chunks$`2`
[1] 5 6 7 8

$chunks$`3`
[1]  9 10 11 12

$chunks$`4`
[1] 13 14 15 16

Split vector based on groups

R




# Create a vector
my_vect <- c(1, 2, 3, 4, 5, 6)
  
# Create a factor to define groups
groups <- factor(c("A", "A", "B", "B", "C", "C"))
  
# Split the vector based on groups
result <- split(my_vect, groups)
  
# Print the result
print(result)


Output:

$A
[1] 1 2

$B
[1] 3 4

$C
[1] 5 6

Split a Vector into Chunks Using cut()

cut() function in R is often used to split numeric data into intervals based on specified breakpoints. In below we generated breakpoints by using seq() function to determine where to cut the vector into chunks. To cut the sequence of indices into intervals defined by the breakpoints we used cut() function. At the end we have used split() function to split vector into chunks based on the cuts.

R




# Create sample vector
my_vect <- 1:10
  
# Define the number of elements you want in each chunk
chunk_size <- 3
  
# Generate breakpoints for cutting the vector into chunks starting from 0
breakpts <- seq(0, length(my_vect)+2, by = chunk_size)
  
# Cut the vector into chunks based on the breakpoints
chunks <- cut(seq_along(my_vect), breaks = breakpts, labels = FALSE)
  
# Split the vector into chunks based on the cuts
chunks <- split(my_vect, chunks)
  
# Print the chunks
print(chunks)


Output:

$`1`
[1] 1 2 3

$`2`
[1] 4 5 6

$`3`
[1] 7 8 9

$`4`
[1] 10

Using a Loop

In this approach we will simply use loop to split vector into chunks. We use a for loop to iterate over the vector. at each iteration loop increments by chunk size. in each iteration of the loop, we determine the end index of the current chunk. We use min() function to ensure that the end index does not exceed the length of the vector. extract subsets of elements from vector corresponding to the current chunk using indexing.

R




# Create sample vector
my_vector <- 1:10
  
# Define the number of elements you want in each chunk
chunk_size <- 3
  
# Initialize an empty list to store chunks
chunks <- list()
  
# Iterate over the vector and extract subsets for each chunk
for (i in seq(1, length(my_vector), by = chunk_size)) {
  # Determine the end index for the current chunk
  end_index <- min(i + chunk_size - 1, length(my_vector))
    
  # Extract subset for the current chunk
  chunk <- my_vector[i:end_index]
    
  # Add the chunk to the list
  chunks[[length(chunks) + 1]] <- chunk
}
  
# Print the chunks
print(chunks)


Output:

[[1]]
[1] 1 2 3

[[2]]
[1] 4 5 6

[[3]]
[1] 7 8 9

[[4]]
[1] 10

Split data frame in R Using split() Function

R




# Create a sample data frame
my_data <- data.frame(
  ID = c(1, 2, 3, 4, 5),
  Name = c("Jayesh", "Anurag", "Vipul", "Pratham", "Shivang"),
  Age = c(25, 30, 22, 35, 28),
  Score = c(85, 92, 78, 95, 88)
)
my_data 
# Split the data frame based on a factor (e.g., Age)
split_data <- split(my_data, my_data$Age)
split_data 


Output:

  ID    Name Age Score
1  1  Jayesh  25    85
2  2  Anurag  30    92
3  3   Vipul  22    78
4  4 Pratham  35    95
5  5 Shivang  28    88

  ID  Name Age Score
3  3 Vipul  22    78

$`25`
  ID   Name Age Score
1  1 Jayesh  25    85

$`28`
  ID    Name Age Score
5  5 Shivang  28    88

$`30`
  ID   Name Age Score
2  2 Anurag  30    92

$`35`
  ID    Name Age Score
4  4 Pratham  35    95

Split data frame in R Using subset() Function

R




# Create a sample data frame
my_data <- data.frame(
  ID = c(1, 2, 3, 4, 5),
  Name = c("Jayesh", "Anurag", "Vipul", "Pratham", "Shivang"),
  Age = c(25, 30, 22, 35, 28),
  Score = c(85, 92, 78, 95, 88)
)
my_data 
# Split the data frame based on a logical condition (e.g., Age greater than 25)
subset_data <- subset(my_data, Age > 25)
subset_data 


Output:

  ID    Name Age Score
1 1 Jayesh 25 85
2 2 Anurag 30 92
3 3 Vipul 22 78
4 4 Pratham 35 95
5 5 Shivang 28 88

ID Name Age Score
2 2 Anurag 30 92
4 4 Pratham 35 95
5 5 Shivang 28 88

Conclusion

In conclusion, splitting a vector into chunks in R can be achieved through various methods, each method has its ownadvantages and flexibility.

  • Using split(): Convenient for splitting based on a factor or grouping variable.
  • Using cut(): Useful for cutting numeric data into intervals and splitting based on breakpoints.
  • Using a loop: Provides control over the splitting process, allowing for customization if needed.


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads