Open In App

How to split a big dataframe into smaller ones in R?

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to learn how to split and write very large data frames into slices in the R programming language.

Introduction

We know we have to deal with large data frames, and that is something which is not easy, So to deal with such large data frames, it is very much helpful to split big data frames into many smaller ones. We often use split functions to do the task. To split very large data frames, there are various steps let’s have a look at that.

Stepwise Implementation:

Step 1: Let’s take a data frame on which we are going to apply the split operation to break it into small chunks.

 P       Q       R
SP1   2012-01   123
SP2   2022-01   143
SP3   2022-01   342
SP1   2022-02   542
SP2   2022-02   876
SP3   2022-02   982
SP1   2022-03   884
SP2   2022-03   936
SP3   2022-03   987

Step 2: Now, in this step, we need something which returns the data into the form of a table, and for that, we will use read.table() function. read.table() function is used to read the data from a text file, and then it returns the data in the form of a table. There are various arguments supported by this function, such as text files, headers, etc.

Syntax: read.table(filename, header = FALSE, sep = “”)

Parameters:

header: represents if the file contains header row or not.
sep: represents the delimiter value used in file.

R




# Reading data in the form
# of table
df <-read.table(text=
          "P      Q      R
         SP1   2012-01   123
         SP2   2022-01   143
         SP3   2022-01   342
         SP1   2022-02   542
         SP2   2022-02   876
         SP3   2022-02   982
         SP1   2022-03   884
         SP2   2022-03   936
         SP3   2022-03   987",
                header = TRUE)
  
# Printing original data frame
print(df)


Output:

    P       Q   R
1 SP1 2012-01 123
2 SP2 2022-01 143
3 SP3 2022-01 342
4 SP1 2022-02 542
5 SP2 2022-02 876
6 SP3 2022-02 982
7 SP1 2022-03 884
8 SP2 2022-03 936
9 SP3 2022-03 987

Step 3: In this step, we will split the data frames into smaller ones, and for that, we have to use the split() function. It is a built-in R function that divides the vector or data frame into smaller groups according to the function’s parameters.

Syntax: split(x, f, drop = FALSE)

Parameters:

x: represents data vector or data frame
f: represents factor to divide the data
drop: represents logical value which indicates if levels that do not occur should be dropped

We need to create some new data frames using the content of any column i.e., Q and P. We will be using the content of column Q, and after that, name the data frames too; below is the code and screenshot referring to how to make a new data frame using the split function, name it and print the new data frame, Below used df1 is the name of the new data frame.

R




df1 = split(df,df$Q)
  
# Printing splitted data frame
print(df1)


Output:

$`2012-01`
    P       Q   R
1 SP1 2012-01 123

$`2022-01`
    P       Q   R
2 SP2 2022-01 143
3 SP3 2022-01 342

$`2022-02`
    P       Q   R
4 SP1 2022-02 542
5 SP2 2022-02 876
6 SP3 2022-02 982

$`2022-03`
    P       Q   R
7 SP1 2022-03 884
8 SP2 2022-03 936
9 SP3 2022-03 987

Step 4: In this step, we will create a new data frame using column P’s content and naming it df2. Below code and screenshot refers to how to make a new data frame using the split() function, name it and print the new data frame, Below used df2 is the name of the new data frame.

R




df2 = split(df,df$P)
  
# Printing splitted data frame
print(df2)


$SP1
    P       Q   R
1 SP1 2012-01 123
4 SP1 2022-02 542
7 SP1 2022-03 884

$SP2
    P       Q   R
2 SP2 2022-01 143
5 SP2 2022-02 876
8 SP2 2022-03 936

$SP3
    P       Q   R
3 SP3 2022-01 342
6 SP3 2022-02 982
9 SP3 2022-03 987

We can see from the output that SP1, SP2, and SP3 are separated, and that’s how we can split the large data frames into smaller ones.



Last Updated : 23 Sep, 2022
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads