How to Split Vector and DataFrame in R
Last Updated :
26 Feb, 2024
R is a programming language and environment specifically designed for facts analysis, statistical computing, and graphics. Sometimes it is required to split data into batches for various data manipulation and analysis tasks. In this article, we will discuss some techniques to split vectors into chunks using the R Programming Language.
Concepts related to the topic
In R language, a vector is a fundamental data structure that stores sequences of elements. Vector is the same as a one-dimensional array in other languages that can hold elements of the same data type.
a chunk is a portion or segment of data that is processed as a unit, often used to improve efficiency, manage memory usage, or handle data streams.
How to split Vector into chunks
Below are the methods that we will cover in this article:
- Using split()
- Using cut()
- Using a Loop
Using split()
The split() is a built-in function in R which is used to split vector, data frame or list into subsets based on the the factor provided.
Syntax
split(x, f)
parameters:
x: object to be split.
f: factor or grouping variable indicating how to split x.
Split vector of numeric data
here we created vector of numbers from 1-16 and provided chunk size as 4.
R
my_vect <- 1:16
print ( 'Vecor before split :' )
print (my_vect)
chunk_size <- 4
chunks <- split (my_vect, ceiling ( seq_along (my_vect) / chunk_size))
chunks_list <- list (chunks=chunks)
print (chunks_list)
|
Output:
[1] "Vecor before split :"
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
$chunks
$chunks$`1`
[1] 1 2 3 4
$chunks$`2`
[1] 5 6 7 8
$chunks$`3`
[1] 9 10 11 12
$chunks$`4`
[1] 13 14 15 16
Split vector based on groups
R
my_vect <- c (1, 2, 3, 4, 5, 6)
groups <- factor ( c ( "A" , "A" , "B" , "B" , "C" , "C" ))
result <- split (my_vect, groups)
print (result)
|
Output:
$A
[1] 1 2
$B
[1] 3 4
$C
[1] 5 6
Split a Vector into Chunks Using cut()
cut() function in R is often used to split numeric data into intervals based on specified breakpoints. In below we generated breakpoints by using seq() function to determine where to cut the vector into chunks. To cut the sequence of indices into intervals defined by the breakpoints we used cut() function. At the end we have used split() function to split vector into chunks based on the cuts.
R
my_vect <- 1:10
chunk_size <- 3
breakpts <- seq (0, length (my_vect)+2, by = chunk_size)
chunks <- cut ( seq_along (my_vect), breaks = breakpts, labels = FALSE )
chunks <- split (my_vect, chunks)
print (chunks)
|
Output:
$`1`
[1] 1 2 3
$`2`
[1] 4 5 6
$`3`
[1] 7 8 9
$`4`
[1] 10
Using a Loop
In this approach we will simply use loop to split vector into chunks. We use a for loop to iterate over the vector. at each iteration loop increments by chunk size. in each iteration of the loop, we determine the end index of the current chunk. We use min() function to ensure that the end index does not exceed the length of the vector. extract subsets of elements from vector corresponding to the current chunk using indexing.
R
my_vector <- 1:10
chunk_size <- 3
chunks <- list ()
for (i in seq (1, length (my_vector), by = chunk_size)) {
end_index <- min (i + chunk_size - 1, length (my_vector))
chunk <- my_vector[i:end_index]
chunks[[ length (chunks) + 1]] <- chunk
}
print (chunks)
|
Output:
[[1]]
[1] 1 2 3
[[2]]
[1] 4 5 6
[[3]]
[1] 7 8 9
[[4]]
[1] 10
Split data frame in R Using split()
Function
R
my_data <- data.frame (
ID = c (1, 2, 3, 4, 5),
Name = c ( "Jayesh" , "Anurag" , "Vipul" , "Pratham" , "Shivang" ),
Age = c (25, 30, 22, 35, 28),
Score = c (85, 92, 78, 95, 88)
)
my_data
split_data <- split (my_data, my_data$Age)
split_data
|
Output:
ID Name Age Score
1 1 Jayesh 25 85
2 2 Anurag 30 92
3 3 Vipul 22 78
4 4 Pratham 35 95
5 5 Shivang 28 88
ID Name Age Score
3 3 Vipul 22 78
$`25`
ID Name Age Score
1 1 Jayesh 25 85
$`28`
ID Name Age Score
5 5 Shivang 28 88
$`30`
ID Name Age Score
2 2 Anurag 30 92
$`35`
ID Name Age Score
4 4 Pratham 35 95
Split data frame in R Using subset()
Function
R
my_data <- data.frame (
ID = c (1, 2, 3, 4, 5),
Name = c ( "Jayesh" , "Anurag" , "Vipul" , "Pratham" , "Shivang" ),
Age = c (25, 30, 22, 35, 28),
Score = c (85, 92, 78, 95, 88)
)
my_data
subset_data <- subset (my_data, Age > 25)
subset_data
|
Output:
ID Name Age Score
1 1 Jayesh 25 85
2 2 Anurag 30 92
3 3 Vipul 22 78
4 4 Pratham 35 95
5 5 Shivang 28 88
ID Name Age Score
2 2 Anurag 30 92
4 4 Pratham 35 95
5 5 Shivang 28 88
Conclusion
In conclusion, splitting a vector into chunks in R can be achieved through various methods, each method has its ownadvantages and flexibility.
- Using split(): Convenient for splitting based on a factor or grouping variable.
- Using cut(): Useful for cutting numeric data into intervals and splitting based on breakpoints.
- Using a loop: Provides control over the splitting process, allowing for customization if needed.
Share your thoughts in the comments
Please Login to comment...