R – DataFrame Manipulation

Data Frame is a two-dimensional structured entity consisting of rows and columns. It consists equal length vectors as rows. The data is stored in cells which are accessed by specifying the corresponding [row, col] set of values of the data frame. Manipulation of data frames involve modifying, extracting and restructuring the contents of a data frame. In this article, we will study about the various operations concerned with the manipulation of data frames in R.

Renaming columns

Columns of a data frame can be renamed to set new names as labels. However, the changes are not reflected in the original data frame. Not all the columns have to be renamed. The column labels may be set to complex numbers, numerical or string values. The time complexity required to rename all the columns is O(c) where c is the number of columns in the data frame. There are two ways to rename columns in a Data Frame:

  • rename() function of the plyr package
    The rename() function of the plyr package modifies the names of the columns based on the old names. It does not take column positions as arguments to rename the column labels.
    Examples:

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # R program to rename a Data Frame
      
    # Adding Package
    df <- library(plyr)
      
    # Creating a Data Frame
    df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8)
    print("Original Data Frame")
    print(df)
    print("Modified Data Frame")
      
    # Renaming Data Frame
    rename(df, c("row1"="one", "row2"="two", "row3"="three"))

    chevron_right

    
    

    Output:

    [1] "Original Data Frame"
      row1 row2 row3
    1    0    3    6
    2    1    4    7
    3    2    5    8
    [1] "Modified Data Frame"
      one two three
    1   0   3     6
    2   1   4     7
    3   2   5     8
     
    

    The column labels are changed. In this case, the result has to be assigned back to the data frame, in order to retain the changes.

  • R’s in-built function: names(data frame)[col]
    The column labels can be renamed either using the column index or column name to set the new values. The changes are reflected in the original data frame. names() function allows us to change the label of a single column at a time.



    Example 1:

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # R program to rename a Data Frame
      
    # Creating a Data Frame
    df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8)
    print("Original Data Frame")
    print(df)
    print("Modified Data Frame")
      
    # Renaming Data Frame
    names(df)[names(df)=="row3"]<-"three"
    print(df)

    chevron_right

    
    

    Output:

    [1] "Original Data Frame"
      row1 row2 row3
    1    0    3    6
    2    1    4    7
    3    2    5    8
    [1] "Modified Data Frame"
      row1 row2 three
    1    0    3     6
    2    1    4     7
    3    2    5     8
     
    

    Here, the label of third column is modified from row3 to three. The changes are retained in the original database.

    Example 2:

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # R program to rename a Data Frame
      
    # Creating a Data Frame
    df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8)
    print("Original Data Frame")
    print(df)
    print("Modified Data Frame")
      
    # Renaming Data Frame
    names(df)[2]<-"two"
    print(df)

    chevron_right

    
    

    Output:

    [1] "Original Data Frame"
      row1 row2 row3
    1    0    3    6
    2    1    4    7
    3    2    5    8
    [1] "Modified Data Frame"
      row1 two row3
    1    0   3    6
    2    1   4    7
    3    2   5    8
    

    Here, the second column label is changed to two from row2.The changes are retained in the original database.

Expanding Data Frame

The data frames can both be expanded further to aggregate more columns or contracted to delete columns.

  • Adding columns to a data frame
    The columns, in the form of vectors can be added using data frame indexing modes. The new column is appended at the end of the data frame. The values in new columns may even be a combination of two existing columns, for instance, the addition or subtraction of two columns. A column consisting of NA values can also be appended. Changes are retained in the original data frame. The time complexity required to add column is O(n) where n is the number of rows of the data frame. There are various ways to add new column.
    Syntax:

    dataframe[[newcol]] <- vector 
    or 
    dataframe[newcol] <-vector 
    or 
    dataframe$newcol <-vector 

    Example:



    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # R program to add column in a Data Frame
      
    # Creating a Data Frame
    df<-data.frame(col1 = 0:2, col2 = 3:5, col3 = 6:8)
    print ("Original Data Frame")
    print (df)
      
    # Adding empty column
    df[["col4"]]<-0
      
    # assigns a value NA to the data frame column 5
    df$"col5"<-NA
      
    # Updating Values of column added
    df[["col5"]] <- df[["col1"]] + df[["col2"]]
    print ("Modified Data Frame")
    print (df)

    chevron_right

    
    

    Output:

    [1] "Original Data Frame"
      col1 col2 col3
    1    0    3    6
    2    1    4    7
    3    2    5    8
    [1] "Modified Data Frame"
      col1 col2 col3 col4 col5
    1    0    3    6    0    3
    2    1    4    7    0    5
    3    2    5    8    0    7

    The entire col4 is assigned a value of vector zero and added at the end in the data frame, first. Then the fifth column is created which is accessed using df$col5, and assigned a value of NA. The corresponding values are then recomputed as a sum of elements of columns 1 and 2.

  • Removing columns from a data frame
    The columns of a data frame can be dropped from the data frame by either their names or index values. Multiple columns can be deleted together from the data frame. The desired column name or index can be assigned to a NULL value, and accordingly the columns are shifted. The columns are then reduced by the number of the deletions. The changes are reflected in the original data frame.

    Example 1:

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # R program to remove a column 
    # from a Data Frame
      
    # Creating a Data Frame
    df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8)
    print ("Original Data Frame")
    print (df)
      
    # Removing a Column
    df[["row2"]]<-NULL
    print(df)

    chevron_right

    
    

    Output:

    [1] "Original Data Frame"
      row1 row2 row3
    1    0    3    6
    2    1    4    7
    3    2    5    8
      row1 row3
    1    0    6
    2    1    7
    3    2    8
     
    

    row2 is deleted from the data frame. The column labels remain the same. df[row2]<-NULL would also produce a similar result.

    Example 2: Delete the columns by integer indexing of the columns

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # R program to remove a column 
    # from a Data Frame
      
    # Creating a Data Frame
    df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8, row4 = rep(5))
    print ("Original Data Frame")
    print (df)
      
    # Removing two columns
    df <- df [-c(1, 3)]
    print(df)

    chevron_right

    
    

    Output:

    [1] "Original Data Frame"
      row1 row2 row3 row4
    1    0    3    6    5
    2    1    4    7    5
    3    2    5    8    5
      row2 row4
    1    3    5
    2    4    5
    3    5    5
    

    The columns to be excluded are specified using a vector -c(..column indices..). Here the columns 1 and 3 are deleted from the data frame, while the changes are still retained in the original data frame.

Subsetting a data frame

subset() function can be used, where the select argument involves the column names to be dropped from a data frame.Multiple column names can also be specified by converting them to a vector c(col1, col2). This operation creates two disjoint sets of the data frame, one with the excluded columns and other with the included columns. The number of columns get reduced by the number of deletions. Changes do reflect in the original data frame.
Syntax:

subset(dataframe, select= - column)

Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to remove a column 
# from a Data Frame
  
# Creating a Data Frame
df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8)
print ("Original Data Frame")
print (df)
  
# Creating a Subset
df<-subset(df, select = - c(row1, row2))
print("Modified Data Frame")
print(df)

chevron_right


Output:



[1] "Original Data Frame"
  row1 row2 row3
1    0    3    6
2    1    4    7
3    2    5    8
[1] "Modified Data Frame"
  row3
1    6
2    7
3    8

Here, row1 and row2 both are removed from the data frame. Hence, this subset contains just one column of the original set of columns.

Reordering columns

Columns of a data frame can be re-ordered by either specifying the column names or column indices in the desired order. The original data frame remains the same. The changes have to be assigned back to retain the ordering. The time complexity required to reorder the columns in worst case is O(m*n) where all the elements have to be shifted to a new position, with m being the number of rows and n being the number of columns.

Example 1:

filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to remove a column 
# from a Data Frame
  
# Creating a Data Frame
df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8)
print("Original Data Frame")
print(df)
print("Modified Data Frame")
  
# Temporary modifying column order
# in a Data Frame
df[,c(2, 1, 3)]

chevron_right


Output:

[1] "Original Data Frame"
  row1 row2 row3
1    0    3    6
2    1    4    7
3    2    5    8
[1] "Modified Data Frame"
  row2 row1 row3
1    3    0    6
2    4    1    7
3    5    2    8

Here, the desired order is specified as column indices. Therefore, the columns are reordered to column indices[2, 1, 3]. Changes are not made to the original data frame.

Example 2:

filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to remove a column 
# from a Data Frame
  
# Creating a Data Frame
df<-data.frame(row1 = 0:2, row2 = 3:5, row3 = 6:8)
print("Original Data Frame")
print(df)
print("Modified Data Frame")
  
# Permanently modifying column order
# in a Data Frame
df <- df[c(2, 1, 3)]
print(df)

chevron_right


Output:

[1] "Original Data Frame"
  row1 row2 row3
1    0    3    6
2    1    4    7
3    2    5    8
[1] "Modified Data Frame"
  row2 row1 row3
1    3    0    6
2    4    1    7
3    5    2    8

Here, the desired order is specified as column names. Changes are made to original data frame.




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

1


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.