Data Structures in R Programming

A data structure is a particular way of organizing data in a computer so that it can be used effectively. The idea is to reduce the space and time complexities of different tasks. Data structures in R programming are tools for holding multiple values.

R’s base data structures are often organized by their dimensionality (1D, 2D, or nD) and whether they’re homogeneous (all elements must be of the identical type) or heterogeneous (the elements are often of various types). This gives rise to the five data types which are most frequently utilized in data analysis. the subsequent table shows a transparent cut view of those data structures.

Dimension Homogenous Heterogeneous
1D Vector List
2D Matrix Dataframe
nD Array

The most essential data structures used in R include:

  • Vectors
  • Lists
  • Dataframes
  • Matrices
  • Arrays
  • Factors

Vectors

A vector is an ordered collection of basic data types of a given length. The only key thing here is all the elements of a vector must be of the identical data type e.g homogenous data structures. Vectors are one-dimensional data structures.



Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to illustrate Vector
  
# Vectors(ordered collection of same data type)
X = c(1, 3, 5, 7, 8)
  
# Printing those elements in console
print(X)

chevron_right


Output:

[1] 1 3 5 7 8

Lists

A list is a generic object consisting of an ordered collection of objects. Lists are heterogeneous data structures. These are also one-dimensional data structures. A list can be a list of vectors, list of matrices, a list of characters and a list of functions and so on.

Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to illustrate a List
  
# The first attributes is a numeric vector
# containing the employee IDs which is 
# created using the 'c' command here
empId = c(1, 2, 3, 4)
  
# The second attribute is the employee name 
# which is created using this line of code here
# which is the character vector 
empName = c("Debi", "Sandeep", "Subham", "Shiba")
  
# The third attribute is the number of employees
# which is a single numeric variable.
numberOfEmp = 4
  
# We can combine all these three different
# data types into a list
# containing the details of employees
# which can be done using a list command
empList = list(empId, empName, numberOfEmp)
  
print(empList)

chevron_right


Output:

[[1]]
[1] 1 2 3 4

[[2]]
[1] "Debi"    "Sandeep" "Subham"  "Shiba"  

[[3]]
[1] 4

Dataframes

Dataframes are generic data objects of R which are used to store the tabular data. Dataframes are the foremost popular data objects in R programming because we are comfortable in seeing the data within the tabular form. They are two-dimensional, heterogeneous data structures. These are lists of vectors of equal lengths.
Data frames have the following constraints placed upon them:

  • A data-frame must have column names and every row should have a unique name.
  • Each column must have the identical number of items.
  • Each item in a single column must be of the same data type.
  • Different columns may have different data types.

To create a data frame we use the data.frame() function.

Example:



filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to illustrate dataframe
  
# A vector which is a character vector
Name = c("Amiya", "Raj", "Asish")
  
# A vector which is a character vector
Language = c("R", "Python", "Java")
  
# A vector which is a numeric vector
Age = c(22, 25, 45)
  
# To create dataframe use data.frame command
# and then pass each of the vectors 
# we have created as arguments
# to the function data.frame()
df = data.frame(Name, Language, Age)
  
print(df)

chevron_right


Output:

   Name Language Age
1 Amiya        R  22
2   Raj   Python  25
3 Asish     Java  45

Matrices

A matrix is a rectangular arrangement of numbers in rows and columns. In a matrix, as we know rows are the ones that run horizontally and columns are the ones that run vertically. Matrices are two-dimensional, homogeneous data structures.

Now, let’s see how to create a matrix in R. To create a matrix in R you need to use the function called matrix. The arguments to this matrix() are the set of elements in the vector. You have to pass how many numbers of rows and how many numbers of columns you want to have in your matrix and this is the important point you have to remember that by default, matrices are in column-wise order.

Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to illustrate a matrix
  
A = matrix(
    # Taking sequence of elements
    c(1, 2, 3, 4, 5, 6, 7, 8, 9), 
      
    # No of rows and columns
    nrow = 3, ncol = 3,  
  
    # By default matrices are 
    # in column-wise order 
    # So this parameter decides
    # how to arrange the matrix          
    byrow = TRUE                             
)
  
print(A)

chevron_right


Output:

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

Arrays

Arrays are the R data objects which store the data in more than two dimensions. Arrays are n-dimensional data structures. For example, if we create an array of dimensions (2, 3, 3) then it creates 3 rectangular matrices each with 2 rows and 3 columns. They are homogeneous data structures.

Now, let’s see how to create arrays in R. To create an array in R you need to use the function called array(). The arguments to this array() are the set of elements in vectors and you have to pass a vector containing the dimensions of the array.

Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to illustrate an array
  
A = array(
    # Taking sequence of elements
    c(1, 2, 3, 4, 5, 6, 7, 8),
  
    # Creating two rectangular matrices 
    # each with two rows and two columns
    dim = c(2, 2, 2)                        
)
  
print(A)

chevron_right


Output:

, , 1

     [,1] [,2]
[1,]    1    3
[2,]    2    4

, , 2

     [,1] [,2]
[1,]    5    7
[2,]    6    8

Factors

Factors are the data objects which are used to categorize the data and store it as levels. They are useful for storing categorical data. They can store both strings and integers. They are useful to categorize unique values in columns like “TRUE” or “FALSE”, or “MALE” or “FEMALE”, etc.. They are useful in data analysis for statistical modeling.

Now, let’s see how to create factors in R. To create a factor in R you need to use the function called factor(). The argument to this factor() is the vector.
Example:

filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to illustrate factors
  
# Creating factor using factor()
fac = factor(c("Male", "Female", "Male",
               "Male", "Female", "Male", "Female"))
  
print(fac)

chevron_right


Output:

[1] Male   Female Male   Male   Female Male   Female
Levels: Female Male



My Personal Notes arrow_drop_up

Technical Content Engineer at GeeksForGeeks

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.