R – DataFrame Manipulation

Last Updated : 21 Apr, 2020

Data Frame is a two-dimensional structured entity consisting of rows and columns. It consists equal length vectors as rows. The data is stored in cells which are accessed by specifying the corresponding [row, col] set of values of the data frame. Manipulation of data frames involve modifying, extracting and restructuring the contents of a data frame. In this article, we will study about the various operations concerned with the manipulation of data frames in R.

Renaming columns

Columns of a data frame can be renamed to set new names as labels. However, the changes are not reflected in the original data frame. Not all the columns have to be renamed. The column labels may be set to complex numbers, numerical or string values. The time complexity required to rename all the columns is O(c) where c is the number of columns in the data frame. There are two ways to rename columns in a Data Frame:

rename() function of the plyr package
The rename() function of the plyr package modifies the names of the columns based on the old names. It does not take column positions as arguments to rename the column labels.
Examples:

# R program to rename a Data Frame 
  
# Adding Package 
df <- library(plyr) 
  
# Creating a Data Frame 
df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8) 
print("Original Data Frame") 
print(df) 
print("Modified Data Frame") 
  
# Renaming Data Frame 
rename(df, c("row1"="one", "row2"="two", "row3"="three")) 

Output:

[1] "Original Data Frame"
  row1 row2 row3
1    0    3    6
2    1    4    7
3    2    5    8
[1] "Modified Data Frame"
  one two three
1   0   3     6
2   1   4     7
3   2   5     8

The column labels are changed. In this case, the result has to be assigned back to the data frame, in order to retain the changes.

R’s in-built function: names(data frame)[col]
The column labels can be renamed either using the column index or column name to set the new values. The changes are reflected in the original data frame. names() function allows us to change the label of a single column at a time.

Example 1:

# R program to rename a Data Frame 
  
# Creating a Data Frame 
df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8) 
print("Original Data Frame") 
print(df) 
print("Modified Data Frame") 
  
# Renaming Data Frame 
names(df)[names(df)=="row3"]<-"three"
print(df) 

Output:

[1] "Original Data Frame"
  row1 row2 row3
1    0    3    6
2    1    4    7
3    2    5    8
[1] "Modified Data Frame"
  row1 row2 three
1    0    3     6
2    1    4     7
3    2    5     8

Here, the label of third column is modified from row3 to three. The changes are retained in the original database.

Example 2:

# R program to rename a Data Frame 
  
# Creating a Data Frame 
df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8) 
print("Original Data Frame") 
print(df) 
print("Modified Data Frame") 
  
# Renaming Data Frame 
names(df)[2]<-"two"
print(df) 

Output:

[1] "Original Data Frame"
  row1 row2 row3
1    0    3    6
2    1    4    7
3    2    5    8
[1] "Modified Data Frame"
  row1 two row3
1    0   3    6
2    1   4    7
3    2   5    8

Here, the second column label is changed to two from row2.The changes are retained in the original database.

Expanding Data Frame

The data frames can both be expanded further to aggregate more columns or contracted to delete columns.

Adding columns to a data frame
The columns, in the form of vectors can be added using data frame indexing modes. The new column is appended at the end of the data frame. The values in new columns may even be a combination of two existing columns, for instance, the addition or subtraction of two columns. A column consisting of NA values can also be appended. Changes are retained in the original data frame. The time complexity required to add column is O(n) where n is the number of rows of the data frame. There are various ways to add new column.
Syntax:

dataframe[[newcol]] <- vector 
or 
dataframe[newcol] <-vector 
or 
dataframe$newcol <-vector

Example:

# R program to add column in a Data Frame 
  
# Creating a Data Frame 
df<-data.frame(col1 = 0:2, col2 = 3:5, col3 = 6:8) 
print ("Original Data Frame") 
print (df) 
  
# Adding empty column 
df[["col4"]]<-0
  
# assigns a value NA to the data frame column 5 
df$"col5"<-NA 
  
# Updating Values of column added 
df[["col5"]] <- df[["col1"]] + df[["col2"]] 
print ("Modified Data Frame") 
print (df) 

Output:

[1] "Original Data Frame"
  col1 col2 col3
1    0    3    6
2    1    4    7
3    2    5    8
[1] "Modified Data Frame"
  col1 col2 col3 col4 col5
1    0    3    6    0    3
2    1    4    7    0    5
3    2    5    8    0    7

The entire col4 is assigned a value of vector zero and added at the end in the data frame, first. Then the fifth column is created which is accessed using df$col5, and assigned a value of NA. The corresponding values are then recomputed as a sum of elements of columns 1 and 2.

Removing columns from a data frame
The columns of a data frame can be dropped from the data frame by either their names or index values. Multiple columns can be deleted together from the data frame. The desired column name or index can be assigned to a NULL value, and accordingly the columns are shifted. The columns are then reduced by the number of the deletions. The changes are reflected in the original data frame.

Example 1:

# R program to remove a column  
# from a Data Frame 
  
# Creating a Data Frame 
df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8) 
print ("Original Data Frame") 
print (df) 
  
# Removing a Column 
df[["row2"]]<-NULL 
print(df) 

Output:

[1] "Original Data Frame"
  row1 row2 row3
1    0    3    6
2    1    4    7
3    2    5    8
  row1 row3
1    0    6
2    1    7
3    2    8

row2 is deleted from the data frame. The column labels remain the same. df[row2]<-NULL would also produce a similar result.

Example 2: Delete the columns by integer indexing of the columns

# R program to remove a column  
# from a Data Frame 
  
# Creating a Data Frame 
df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8, row4 = rep(5)) 
print ("Original Data Frame") 
print (df) 
  
# Removing two columns 
df <- df [-c(1, 3)] 
print(df) 

Output:

[1] "Original Data Frame"
  row1 row2 row3 row4
1    0    3    6    5
2    1    4    7    5
3    2    5    8    5
  row2 row4
1    3    5
2    4    5
3    5    5

The columns to be excluded are specified using a vector -c(..column indices..). Here the columns 1 and 3 are deleted from the data frame, while the changes are still retained in the original data frame.

Subsetting a data frame

subset() function can be used, where the select argument involves the column names to be dropped from a data frame.Multiple column names can also be specified by converting them to a vector c(col1, col2). This operation creates two disjoint sets of the data frame, one with the excluded columns and other with the included columns. The number of columns get reduced by the number of deletions. Changes do reflect in the original data frame.
Syntax:

subset(dataframe, select= - column)

Example:

# R program to remove a column  
# from a Data Frame 
  
# Creating a Data Frame 
df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8) 
print ("Original Data Frame") 
print (df) 
  
# Creating a Subset 
df<-subset(df, select = - c(row1, row2)) 
print("Modified Data Frame") 
print(df) 

Output:

[1] "Original Data Frame"
  row1 row2 row3
1    0    3    6
2    1    4    7
3    2    5    8
[1] "Modified Data Frame"
  row3
1    6
2    7
3    8

Here, row1 and row2 both are removed from the data frame. Hence, this subset contains just one column of the original set of columns.

Reordering columns

Columns of a data frame can be re-ordered by either specifying the column names or column indices in the desired order. The original data frame remains the same. The changes have to be assigned back to retain the ordering. The time complexity required to reorder the columns in worst case is O(m*n) where all the elements have to be shifted to a new position, with m being the number of rows and n being the number of columns.

Example 1:

# R program to remove a column  
# from a Data Frame 
  
# Creating a Data Frame 
df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8) 
print("Original Data Frame") 
print(df) 
print("Modified Data Frame") 
  
# Temporary modifying column order 
# in a Data Frame 
df[,c(2, 1, 3)] 

Output:

[1] "Original Data Frame"
  row1 row2 row3
1    0    3    6
2    1    4    7
3    2    5    8
[1] "Modified Data Frame"
  row2 row1 row3
1    3    0    6
2    4    1    7
3    5    2    8

Here, the desired order is specified as column indices. Therefore, the columns are reordered to column indices[2, 1, 3]. Changes are not made to the original data frame.

Example 2:

# R program to remove a column  
# from a Data Frame 
  
# Creating a Data Frame 
df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8) 
print("Original Data Frame") 
print(df) 
print("Modified Data Frame") 
  
# Permanently modifying column order 
# in a Data Frame 
df <- df[c(2, 1, 3)] 
print(df) 

Output:

[1] "Original Data Frame"
  row1 row2 row3
1    0    3    6
2    1    4    7
3    2    5    8
[1] "Modified Data Frame"
  row2 row1 row3
1    3    0    6
2    4    1    7
3    5    2    8

Here, the desired order is specified as column names. Changes are made to original data frame.

Suggest improvement

DataFrame Operations in R

Share your thoughts in the comments

R – DataFrame Manipulation

Renaming columns

Expanding Data Frame

Subsetting a data frame

Reordering columns

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?