Open In App

R – DataFrame Manipulation

Data Frame is a two-dimensional structured entity consisting of rows and columns. It consists equal length vectors as rows. The data is stored in cells which are accessed by specifying the corresponding [row, col] set of values of the data frame. Manipulation of data frames involve modifying, extracting and restructuring the contents of a data frame. In this article, we will study about the various operations concerned with the manipulation of data frames in R.

Renaming columns

Columns of a data frame can be renamed to set new names as labels. However, the changes are not reflected in the original data frame. Not all the columns have to be renamed. The column labels may be set to complex numbers, numerical or string values. The time complexity required to rename all the columns is O(c) where c is the number of columns in the data frame. There are two ways to rename columns in a Data Frame:



Expanding Data Frame

The data frames can both be expanded further to aggregate more columns or contracted to delete columns.

Subsetting a data frame

subset() function can be used, where the select argument involves the column names to be dropped from a data frame.Multiple column names can also be specified by converting them to a vector c(col1, col2). This operation creates two disjoint sets of the data frame, one with the excluded columns and other with the included columns. The number of columns get reduced by the number of deletions. Changes do reflect in the original data frame.
Syntax:

subset(dataframe, select= - column)

Example:




# R program to remove a column 
# from a Data Frame
  
# Creating a Data Frame
df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8)
print ("Original Data Frame")
print (df)
  
# Creating a Subset
df<-subset(df, select = - c(row1, row2))
print("Modified Data Frame")
print(df)

Output:

[1] "Original Data Frame"
  row1 row2 row3
1    0    3    6
2    1    4    7
3    2    5    8
[1] "Modified Data Frame"
  row3
1    6
2    7
3    8

Here, row1 and row2 both are removed from the data frame. Hence, this subset contains just one column of the original set of columns.

Reordering columns

Columns of a data frame can be re-ordered by either specifying the column names or column indices in the desired order. The original data frame remains the same. The changes have to be assigned back to retain the ordering. The time complexity required to reorder the columns in worst case is O(m*n) where all the elements have to be shifted to a new position, with m being the number of rows and n being the number of columns.

Example 1:




# R program to remove a column 
# from a Data Frame
  
# Creating a Data Frame
df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8)
print("Original Data Frame")
print(df)
print("Modified Data Frame")
  
# Temporary modifying column order
# in a Data Frame
df[,c(2, 1, 3)]

Output:

[1] "Original Data Frame"
  row1 row2 row3
1    0    3    6
2    1    4    7
3    2    5    8
[1] "Modified Data Frame"
  row2 row1 row3
1    3    0    6
2    4    1    7
3    5    2    8

Here, the desired order is specified as column indices. Therefore, the columns are reordered to column indices[2, 1, 3]. Changes are not made to the original data frame.

Example 2:




# R program to remove a column 
# from a Data Frame
  
# Creating a Data Frame
df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8)
print("Original Data Frame")
print(df)
print("Modified Data Frame")
  
# Permanently modifying column order
# in a Data Frame
df <- df[c(2, 1, 3)]
print(df)

Output:

[1] "Original Data Frame"
  row1 row2 row3
1    0    3    6
2    1    4    7
3    2    5    8
[1] "Modified Data Frame"
  row2 row1 row3
1    3    0    6
2    4    1    7
3    5    2    8

Here, the desired order is specified as column names. Changes are made to original data frame.


Article Tags :