Open In App

R – Create Dataframe From Existing Dataframe

Last Updated : 12 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Create Dataframes when dealing with organized data so sometimes we also need to make Dataframes from already existing Dataframes. In this Article, let’s explore various ways to create a data frame from an existing data frame in R Programming Language.

Using Base R data. frame() Function

Using base R functionality, the creation of a new data frame from an existing one using direct column referencing. By using the data.frame() function, specific columns such as ‘Name’ and ‘Age’ are extracted from the original dataframe ‘df’, showing a normal approach to dataframe manipulation in R.

R
# Create a dataframe
df <- data.frame(
  ID = 1:5,
  Name = c("Shravan", "Jeetu", "Lakhan", "Pankaj", "Mihika"),
  Age = c(20, 18, 19, 20, 18),
  Score = c(80, 75, 85, 90, 95)
)

# Display the original dataframe
print("Original Dataframe:")
print(df)

# Create a new dataframe using direct column referencing
new_df <- data.frame(
  Name = df$Name,
  Age = df$Age
)

# Display the new dataframe
print("New Dataframe created using Direct Column Referencing:")
print(new_df)

Output:

[1] "Original Dataframe:"
ID Name Age Score
1 1 Shravan 20 80
2 2 Jeetu 18 75
3 3 Lakhan 19 85
4 4 Pankaj 20 90
5 5 Mihika 18 95

[1] "New Dataframe created using Direct Column Referencing:"
Name Age
1 Shravan 20
2 Jeetu 18
3 Lakhan 19
4 Pankaj 20
5 Mihika 18

Using subset() Function

Using the subset() function in R, the creation of a new dataframe from an existing one by selectively extracting columns ‘Name’ and ‘Score’. Through this example, the subset function shows a simple approach to dataframe manipulation.

R
# Create a dataframe
df <- data.frame(
  ID = 1:5,
  Name = c("Shravan", "Jeetu", "Lakhan", "Pankaj", "Mihika"),
  Age = c(20, 18, 19, 20, 18),
  Score = c(80, 75, 85, 90, 95)
)

# Display the original dataframe
print("Original Dataframe:")
print(df)

# Using the subset() function to create a new dataframe
new_df_subset_func <- subset(df, select = c(Name, Score))

# Display the new dataframe created using the subset() Function
print("New Dataframe created using subset() Function:")
print(new_df_subset_func)

Output:

[1] "Original Dataframe:"
ID Name Age Score
1 1 Shravan 20 80
2 2 Jeetu 18 75
3 3 Lakhan 19 85
4 4 Pankaj 20 90
5 5 Mihika 18 95

[1] "New Dataframe created using subset() Function:"
Name Score
1 Shravan 80
2 Jeetu 75
3 Lakhan 85
4 Pankaj 90
5 Mihika 95

Using merge() Function

The merging of two dataframes, ‘df1’ and ‘df2’, based on their common column ‘Name’ using the merge() function in R. By combining data from both datasets, this approach allows thorough data aggregation, showing a suitable view of the information included within each dataframe.

R
# Create the first dataframe
df1 <- data.frame(
  Name = c("Shravan", "Jeetu", "Lakhan", "Pankaj", "Mihika"),
  Age = c(20, 18, 19, 20, 18),
  Score = c(80, 75, 85, 90, 95)
)

# Create the second dataframe
df2 <- data.frame(
  Name = c("Shravan", "Jeetu", "Mihika"),
  Gender = c("Male", "Male", "Female")
)

# Display the first dataframe
cat("First Dataframe (df1):\n")
print(df1)

# Display the second dataframe
cat("\nSecond Dataframe (df2):\n")
print(df2)

# Merge dataframes based on common column 'Name'
new_df <- merge(df1, df2, by = "Name")

# Display the new merged dataframe
cat("\nMerged Dataframe (new_df):\n")
print(new_df)

Output:

First Dataframe (df1):
Name Age Score
1 Shravan 20 80
2 Jeetu 18 75
3 Lakhan 19 85
4 Pankaj 20 90
5 Mihika 18 95

Second Dataframe (df2):
Name Gender
1 Shravan Male
2 Jeetu Male
3 Mihika Female

Merged Dataframe (new_df):
Name Age Score Gender
1 Jeetu 18 75 Male
2 Mihika 18 95 Female
3 Shravan 20 80 Male

Using Subset Method

The Subset Method in R is used to create a new dataframe by selectively extracting specific columns from an existing dataframe. By using less code and column indexing, this method is a simple approach to dataframe manipulation.

R
# Create a dataframe
df <- data.frame(
  ID = 1:5,
  Name = c("Shravan", "Jeetu", "Lakhan", "Pankaj", "Mihika"),
  Age = c(20, 18, 19, 20, 18),
  Score = c(80, 75, 85, 90, 95)
)

# Display the original dataframe
print("Original Dataframe:")
print(df)

# Subsetting the dataframe to select desired columns
new_df_subset <- df[, c("Name", "Age")]

# Display the new dataframe created using the Subset Method
print("New Dataframe created using Subset Method:")
print(new_df_subset)

Output:

[1] "Original Dataframe:"
ID Name Age Score
1 1 Shravan 20 80
2 2 Jeetu 18 75
3 3 Lakhan 19 85
4 4 Pankaj 20 90
5 5 Mihika 18 95

[1] "New Dataframe created using Subset Method:"
Name Age
1 Shravan 20
2 Jeetu 18
3 Lakhan 19
4 Pankaj 20
5 Mihika 18

Using dplyr package

The select() function from the dplyr package in R is used to create a new dataframe by selecting specific columns from an existing dataframe. By using simple functions provided by dplyr, data scientists can easily manipulate datasets to their analytical needs.

R
# Load the dplyr package
library(dplyr)

# Create a dataframe
df <- data.frame(
  ID = 1:5,
  Name = c("Shravan", "Jeetu", "Lakhan", "Pankaj", "Mihika"),
  Age = c(20, 18, 19, 20, 18),
  Score = c(80, 75, 85, 90, 95)
)

# Display the original dataframe
cat("Original Dataframe:\n")
print(df)

# Using dplyr package: Selecting specific columns using select() function
new_df_dplyr <- select(df, Name, Score)

# Display the new dataframe created using dplyr package
cat("\nNew Dataframe created using dplyr package:\n")
print(new_df_dplyr)

Output:

Original Dataframe:
ID Name Age Score
1 1 Shravan 20 80
2 2 Jeetu 18 75
3 3 Lakhan 19 85
4 4 Pankaj 20 90
5 5 Mihika 18 95

New Dataframe created using dplyr package
Name Score
1 Shravan 80
2 Jeetu 75
3 Lakhan 85
4 Pankaj 90
5 Mihika 95

Using data.table Package

The creation of a new dataframe from an existing one using the data.table package in R. By using data.table, first dataframe is converted to data.table then after selecting specific columns, its again converted back to dataframe and then new dataframe is being printed.

Note: Before running this code install the data.table package.

R
# Load the data.table package
library(data.table)

# Create a dataframe
df <- data.frame(
  ID = 1:5,
  Name = c("Shravan", "Jeetu", "Lakhan", "Pankaj", "Mihika"),
  Age = c(20, 18, 19, 20, 18),
  Score = c(80, 75, 85, 90, 95)
)

# Convert the dataframe to a data.table
dt <- as.data.table(df)

# Display the original data.table
cat("Original Data.table:\n")
print(dt)

# Using data.table package: Selecting specific columns using data.table syntax
new_dt <- dt[, .(Name, Age)]

# Convert the result back to a dataframe
new_df_data_table <- as.data.frame(new_dt)

# Display the new dataframe created using data.table package
cat("\nNew Dataframe created using data.table package:\n")
print(new_df_data_table)

Output:

Original Data.table:
ID Name Age Score
<int> <char> <num> <num>
1: 1 Shravan 20 80
2: 2 Jeetu 18 75
3: 3 Lakhan 19 85
4: 4 Pankaj 20 90
5: 5 Mihika 18 95

New Dataframe created using data.table package:
Name Age
1 Shravan 20
2 Jeetu 18
3 Lakhan 19
4 Pankaj 20
5 Mihika 18


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads