Open In App

How to split DataFrame in R

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss how to split the dataframe in R programming language.

A subset can be split both continuously as well as randomly based on rows and columns. The rows and columns of the dataframe can be referenced using the indexes as well as names. Multiple rows and columns can be referred using the c() method in base R.

Splitting dataframe by row

Splitting dataframe by row indexes

The dataframe cells can be referenced using the row and column names and indexes.

Syntax:

data-frame[start-row-num:end-row-num,]

The row numbers are retained in the final output dataframe. 

Example: Splitting dataframe by row

R




# create first dataframe
data_frame1<-data.frame(col1=c(rep('Grp1',2),
                               rep('Grp2',2),
                               rep('Grp3',2)), 
                        col2=rep(1:3,2),
                        col3=rep(1:2,3) 
                        )
  
print("Original DataFrame")
print(data_frame1)
  
# extracting first four rows
data_frame_mod <- data_frame1[1:4,]
  
print("Modified DataFrame")
print(data_frame_mod)


Output:

[1] "Original DataFrame"
col1 col2 col3 
1 Grp1    1    1 
2 Grp1    2    2 
3 Grp2    3    1 
4 Grp2    1    2 
5 Grp3    2    1 
6 Grp3    3    2 
[1] "Modified DataFrame" 
col1 col2 col3 
1 Grp1    1    1 
2 Grp1    2    2 
3 Grp2    3    1 
4 Grp2    1    2

Example: Splitting dataframe by row

R




# create first dataframe
data_frame1<-data.frame(col1=c(rep('Grp1',2),
                               rep('Grp2',2),
                               rep('Grp3',2)), 
                        col2=rep(1:3,2),
                        col3=rep(1:2,3) 
                        )
  
print("Original DataFrame")
print(data_frame1)
  
# extracting first four rows
data_frame_mod <- data_frame1[6,]
print("Modified DataFrame")
print(data_frame_mod)


Output:

[1] "Original DataFrame"
col1 col2 col3 
1 Grp1    1    1 
2 Grp1    2    2 
3 Grp2    3    1 
4 Grp2    1    2 
5 Grp3    2    1 
6 Grp3    3    2 
[1] "Modified DataFrame" 
col1 col2 col3 
6 Grp3    3    2

Splitting dataframe rows randomly

The dataframe rows can also be generated randomly by using the set.seed() method. It generates a random sample, which is then fed into any arbitrary random dummy generator function. The rows can then be extracted by comparing them to a function. 

Example: Splitting dataframe by rows randomly

R




# create first dataframe
data_frame1<-data.frame(col1=c(rep('Grp1',2),
                               rep('Grp2',2),
                               rep('Grp3',2)), 
                        col2=rep(1:3,2),
                        col3=rep(1:2,3),
                        col4 = letters[1:6]
                        )
  
print("Original DataFrame")
print(data_frame1)
  
# extracting last two columns
set.seed(99999)                           
  
rows <- nrow(data_frame1)
rand <- rbinom(rows, 2, 0.5)
  
data_frame_mod <- data_frame1[rand == 0, ] 
  
print("Modified DataFrame")
print(data_frame_mod)


Output:

[1] "Original DataFrame" 
col1 col2 col3 col4
 1 Grp1    1    1    a 
2 Grp1    2    2    b 
3 Grp2    3    1    c 
4 Grp2    1    2    d 
5 Grp3    2    1    e 
6 Grp3    3    2    f 
[1] "Modified DataFrame" 
col1 col2 col3 col4
5 Grp3    2    1    e
6 Grp3    3    2    f

Splitting dataframe by column

Splitting dataframe by column names

The dataframe can also be referenced using the column names. Multiple column names can be specified using the c() method containing column names as strings. The column names may be contiguous or random in nature. 

Syntax:

data-frame[,c(col1, col2,...)]

Example: splitting dataframe by column names

R




# create first dataframe
data_frame1<-data.frame(col1=c(rep('Grp1',2),
                               rep('Grp2',2),
                               rep('Grp3',2)), 
                        col2=rep(1:3,2),
                        col3=rep(1:2,3),
                        col4 = letters[1:6]
                        )
  
print("Original DataFrame")
print(data_frame1)
  
# extracting sixth row
data_frame_mod <- data_frame1[,c("col2","col4")]
print("Modified DataFrame")
print(data_frame_mod)


Output:

[1] "Original DataFrame" 
col1 col2 col3 col4
 1 Grp1    1    1    a 
2 Grp1    2    2    b 
3 Grp2    3    1    c 
4 Grp2    1    2    d 
5 Grp3    2    1    e 
6 Grp3    3    2    f 
[1] "Modified DataFrame" 
col2 col4 
1    1    a 
2    2    b 
3    3    c 
4    1    d 
5    2    e 
6    3    f

Splitting dataframe by column indices

The dataframe can also be referenced using the column indices. Individual, as well as multiple columns, can be extracted from the dataframe by specifying the column position. 

Syntax:

data-frame[,start-col-num:end-col-num]

Example: Split dataframe by column indices

R




# create first dataframe
data_frame1<-data.frame(col1=c(rep('Grp1',2),
                               rep('Grp2',2),
                               rep('Grp3',2)), 
                        col2=rep(1:3,2),
                        col3=rep(1:2,3),
                        col4 = letters[1:6]
                        )
  
print("Original DataFrame")
print(data_frame1)
  
# extracting last two columns
data_frame_mod <- data_frame1[,c(3:4)]
print("Modified DataFrame")
print(data_frame_mod)


Output:

[1] "Original DataFrame" 
col1 col2 col3 col4
 1 Grp1    1    1    a 
2 Grp1    2    2    b 
3 Grp2    3    1    c 
4 Grp2    1    2    d 
5 Grp3    2    1    e 
6 Grp3    3    2    f 
[1] "Modified DataFrame" 
col3 col4 
1    1    a 
2    2    b 
3    1    c 
4    2    d 
5    1    e 
6    2    f


Last Updated : 23 Sep, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads