Open In App

Learn R Programming

Last Updated : 22 Aug, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

R is a Programming Language that is mostly used for machine learning, data analysis, and statistical computing. It is an interpreted language and is platform independent that means it can be used on platforms like Windows, Linux, and macOS.

Learn R Programming

In this R Language tutorial, we will Learn R Programming Language from scratch to advance and this tutorial is suitable for both beginners and experienced developers).

Why Learn R Programming Language?

  • R programming is used as a leading tool for machine learning, statistics, and data analysis.
  • R is an open-source language that means it is free of cost and anyone from any organization can install it without purchasing a license.
  • It is available across widely used platforms like windows, Linux, and macOS.
  • R programming language is not only a statistic package but also allows us to integrate with other languages (C, C++). Thus, you can easily interact with many data sources and statistical packages.
  • Its user base is growing day by day and has vast community support.
  • R Programming Language is currently one of the most requested programming languages in the Data Science job market that makes it the hottest trend nowadays.

Key Features and Applications

Some key features of R that make the R one of the most demanding job in data science market are:

  • Basic Statistics: The most common basic statistics terms are the mean, mode, and median. These are all known as “Measures of Central Tendency.” So using the R language we can measure central tendency very easily.
  • Static graphics: R is rich with facilities for creating and developing various kinds of static graphics including graphic maps, mosaic plots, biplots, and the list goes on.
  • Probability distributions: Using R we can easily handle various types of probability distribution such as Binomial Distribution, Normal Distribution, Chi-squared Distribution, and many more.
  • R Packages: One of the major features of R is it has a wide availability of libraries. R has CRAN(Comprehensive R Archive Network), which is a repository holding more than 10,0000 packages.
  • Distributed Computing: Distributed computing is a model in which components of a software system are shared among multiple computers to improve efficiency and performance. Two new packages ddR and multidplyr used for distributed programming in R were released in November 2015.

Applications of R

Applications of R

Download and Installation

There are many IDE’s available for using R in this article we will dealing with the installation of RStudio in R.

Refer to the below articles to get detailed information about RStudio and its installation.

Hello World in R

R Program can be run in several ways. You can choose any of the following options to continue with this tutorial.

  • Using IDEs like RStudio, Eclipse, Jupyter, Notebook, etc.
  • Using R Command Prompt
  • Using RScripts

Now type the below code to print hello world on your console.

R




# R Program to print
# Hello World
 
print("HelloWorld")


 Output:

[1] "HelloWorld"

Note: For more information, refer Hello World in R Programming

 

Fundamentals of R

Variables:

R is a dynamically typed language, i.e. the variables are not declared with a data type rather they take the data type of the R-object assigned to them. In R, the assignment can be denoted in three ways.

  • Using equal operator- data is copied from right to left.
variable_name = value
  • Using leftward operator- data is copied from right to left.
variable_name <- value
  • Using rightward operator- data is copied from left to right.
value -> variable_name

Example:

R




# R program to illustrate
# Initialization of variables
 
# using equal to operator
var1 = "gfg"
print(var1)
 
# using leftward operator
var2 <- "gfg"
print(var2)
 
# using rightward operator
"gfg" -> var3
print(var3)


 Output:

[1] "gfg"
[1] "gfg"
[1] "gfg"

Note: For more information, refer R – Variables.

Comments:

Comments are the english sentences that are used to add useful information to the source code to make it more understandable by the reader. It explains the logic part used in the code and will have no impact in the code during its execution. Any statement starting with “#” is a comment in R.

Example:

R




# all the lines starting with '#'
# are comments and will be ignored
# during the execution of the
# program
 
# Assigning values to variables
a <- 1
b <- 2
 
# Printing sum
print(a + b)


Output:

[1] 3

Note: For more information, refer Comments in R

Operators

Operators are the symbols directing the various kinds of operations that can be performed between the operands. Operators simulate the various mathematical, logical and decision operations performed on a set of Complex Numbers, Integers, and Numericals as input operands. These are classified based on their functionality –

  • Arithmetic Operators: Arithmetic operations simulate various math operations, like addition, subtraction, multiplication, division and modulo.

Example:

R




# R program to illustrate
# the use of Arithmetic operators
a <- 12
b <- 5
 
# Performing operations on Operands
cat ("Addition :", a + b, "\n")
cat ("Subtraction :", a - b, "\n")
cat ("Multiplication :", a * b, "\n")
cat ("Division :", a / b, "\n")
cat ("Modulo :", a %% b, "\n")
cat ("Power operator :", a ^ b)


 Output:

Addition : 17 
Subtraction : 7 
Multiplication : 60 
Division : 2.4 
Modulo : 2 
Power operator : 248832
  • Logical Operators: Logical operations simulate element-wise decision operations, based on the specified operator between the operands, which are then evaluated to either a True or False boolean value.

Example:

R




# R program to illustrate
# the use of Logical operators
vec1 <- c(FALSE, TRUE)
vec2 <- c(TRUE,FALSE)
 
# Performing operations on Operands
cat ("Element wise AND :", vec1 & vec2, "\n")
cat ("Element wise OR :", vec1 | vec2, "\n")
cat ("Logical AND :", vec1 && vec2, "\n")
cat ("Logical OR :", vec1 || vec2, "\n")
cat ("Negation :", !vec1)


 Output:

Element wise AND : FALSE FALSE 
Element wise OR : TRUE TRUE 
Logical AND : FALSE 
Logical OR : TRUE 
Negation : TRUE FALSE
  • Relational Operators: The relational operators carry out comparison operations between the corresponding elements of the operands.

Example:

R




# R program to illustrate
# the use of Relational operators
a <- 10
b <- 14
 
# Performing operations on Operands
cat ("a less than b :", a < b, "\n")
cat ("a less than equal to b :", a <= b, "\n")
cat ("a greater than b :", a > b, "\n")
cat ("a greater than equal to b :", a >= b, "\n")
cat ("a not equal to b :", a != b, "\n")


 Output:

a less than b : TRUE 
a less than equal to b : TRUE 
a greater than b : FALSE 
a greater than equal to b : FALSE 
a not equal to b : TRUE 
  • Assignment Operators: Assignment operators are used to assign values to various data objects in R.

Example:

R




# R program to illustrate
# the use of Assignment operators
 
# Left assignment operator
v1 <- "GeeksForGeeks"
v2 <<- "GeeksForGeeks"
v3 = "GeeksForGeeks"
 
# Right Assignment operator
"GeeksForGeeks" ->> v4
"GeeksForGeeks" -> v5
 
# Performing operations on Operands
cat("Value 1 :", v1, "\n")
cat("Value 2 :", v2, "\n")
cat("Value 3 :", v3, "\n")
cat("Value 4 :", v4, "\n")
cat("Value 5 :", v5)


 Output:

Value 1 : GeeksForGeeks 
Value 2 : GeeksForGeeks 
Value 3 : GeeksForGeeks 
Value 4 : GeeksForGeeks 
Value 5 : GeeksForGeeks

Note: For more information, refer R – Operators

Keywords:

Keywords are specific reserved words in R, each of which has a specific feature associated with it. Here is the list of keywords in R:

 

if function FALSE NA_integer
else in NULL NA_real
while next Inf NA_complex_
repeat break NaN NA_character_
for TRUE NA

 

Note: For more information, refer R – Keywords

Data Types

Each variable in R has an associated data type. Each data type requires different amounts of memory and has some specific operations which can be performed over it. R supports 5 type of data types. These are –

 

Data Types Example Description
Numeric 1, 2, 12, 36 Decimal values are called numerics in R. It is the default data type for numbers in R.
Integer 1L, 2L, 34L R supports integer data types which are the set of all integers. Capital ‘L’ notation as a suffix is used to denote that a particular value is of the integer data type.
Logical TRUE, FALSE Take either a value of true or false
Complex 2+3i, 5+7i Set of all the complex numbers. The complex data type is to store numbers with an imaginary component.
Character ‘a’, ’12’, “GFG”, ”’hello”’ R supports character data types where you have all the alphabets and special characters.

Example:

R




# A simple R program
# to illustrate data type
 
print("Numberic type")
# Assign a decimal value to x
x = 12.25
 
# print the class name of variable
print(class(x))
 
# print the type of variable
print(typeof(x))
 
print("----------------------------")
print("Integer Type")
# Declare an integer by appending an
# L suffix.
y = 15L
 
# print the class name of y
print(class(y))
 
# print the type of y
print(typeof(y))
 
print("----------------------------")
print("Logical Type")
# Sample values
x = 1
y = 2
 
# Comparing two values
z = x > y
 
# print the logical value
print(z)
 
# print the class name of z
print(class(z))
 
# print the type of z
print(typeof(z))
 
print("----------------------------")
print("Complex Type")
# Assign a complex value to x
x = 12 + 13i
 
# print the class name of x
print(class(x))
 
# print the type of x
print(typeof(x))
 
print("----------------------------")
print("Character Type")
 
# Assign a character value to char
char = "GFG"
 
# print the class name of char
print(class(char))
 
# print the type of char
print(typeof(char))


 
 Output:

[1] "Numberic type"
[1] "numeric"
[1] "double"
[1] "----------------------------"
[1] "Integer Type"
[1] "integer"
[1] "integer"
[1] "----------------------------"
[1] "Logical Type"
[1] TRUE
[1] "logical"
[1] "logical"
[1] "----------------------------"
[1] "Complex Type"
[1] "complex"
[1] "complex"
[1] "----------------------------"
[1] "Character Type"
[1] "character"
[1] "character"

 

Note: for more information, refer R – Data Types

 

Basics of Input/Output

Taking Input from the User:

R Language provides us with two inbuilt functions to read the input from the keyboard.

  • readline() method: It takes input in string format. If one inputs an integer then it is inputted as a string.

Example:

R




# R program to illustrate
# taking input from the user
 
# taking input using readline()
# this command will prompt you
# to input a desired value
var = readline();


  • scan() method: This method reads data in the form of a vector or list. This method is a very handy method while inputs are needed to taken quickly for any mathematical calculation or for any dataset.

Example:

R




# R program to illustrate
# taking input from the user
 
# taking input using scan()
x = scan()


Note: For more information, refer Taking Input from User in R Programming

Printing Output to Console:

R Provides various functions to write output to the screen, let’s see them –

  • print(): It is the most common method to print the output.

Example: 

R




# R program to illustrate
# printing output of an R program
 
# print string
print("Hello")
 
# print variable
# it will print 'GeeksforGeeks' on
# the console
x <- "Welcome to GeeksforGeeks"
print(x)


 Output:

[1] "Hello"
[1] "Welcome to GeeksforGeeks"
  • cat(): cat() converts its arguments to character strings. This is useful for printing output in user defined functions.
     

Example:

R




# R program to illustrate
# printing output of an R
# program
 
# print string with variable
# "\n" for new line
x = "Hello"
cat(x, "\nwelcome")
 
# print normal string
cat("\nto GeeksForGeeks")


 Output:

Hello 
welcome
to GeeksForGeeks

Note: For more information, refer Printing Output of an R Program

Decision Making

Decision making decides the flow of the execution of the program based on certain conditions. In decision making programmer needs to provide some condition which is evaluated by the program, along with it there also provided some statements which are executed if the condition is true and optionally other statements if the condition is evaluated to be false.

 

Decision-making statements in R Language:

Example 1: Demonstrating if and if-else

R




# R program to illustrate
# decision making
 
a <- 99
b <- 12
 
# if statement to check whether
# the number a is larger or not
if(a > b)
{
    print("A is Larger")
}
 
 
# if-else statement to check which
# number is greater
if(b > a)
{
    print("B is Larger")
} else
{
    print("A is Larger")
}


 Output:

[1] "A is Larger"
[1] "A is Larger"

Example 2: Demonstrating if-else-if and nested if

R




# R program to demonstrate
# decision making
  
a <- 10
  
# is-elif
if (a == 11)
{
    print ("a is 11")
} else if (a==10)
{
    print ("a is 10")
} else
    print ("a is not present")
 
# Nested if to check whether a
# number is divisible by both 2 and 5
if (a %% 2 == 0)
{
    if (a %% 5 == 0)
        print("Number is divisible by both 2 and 5")
}


 Output:

[1] "a is 10"
[1] "Number is divisible by both 2 and 5"

Example 3: Demonstrating switch

R




# R switch statement example
 
# Expression in terms of the index value
x <- switch(
    2,             # Expression
    "Welcome",     # case 1
    "to",         # case 2
    "GFG"         # case 3
)
print(x)
 
# Expression in terms of the string value
y <- switch(
    "3",                 # Expression
    "0"="Welcome",     # case 1
    "1"="to",         # case 2
    "3"="GFG"         # case 3
)
print(y)
 
z <- switch(
    "GfG",                 # Expression
    "GfG0"="Welcome",     # case 1
    "GfG1"="to",         # case 2
    "GfG3"="GFG"         # case 3
)
print(z)


 Output:

[1] "to"
[1] "GFG"
NULL 

Note: For more information, refer Decision Making in R Programming

Control Flow

Loops are used wherever we have to execute a block of statements repeatedly. For example, printing “hello world” 10 times. The different types of loops in R are –

Example:

R




# R Program to demonstrate the use of
# for loop along with concatenate
for (i in c(-8, 9, 11, 45))
{
    print(i)
}


 
 Output:

[1] -8
[1] 9
[1] 11
[1] 45

Example:

R




# R program to demonstrate the
# use of while loop
 
val = 1
 
# using while loop
while (val <= 5 )
{
    # statements
    print(val)
    val = val + 1
}


 Output:

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Example:

R




# R program to demonstrate the use
# of repeat loop
 
val = 1
 
# using repeat loop
repeat
{
    # statements
    print(val)
    val = val + 1
 
    # checking stop condition
    if(val > 5)
    {
        # using break statement
        # to terminate the loop
        break
    }
}


 Output:

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5 

Note: For more information, refer Loops in R

 

Loop Control Statements

Loop control statements change execution from its normal sequence. Following are the loop control statements provided by R Language:

  • Break Statement: The break keyword is a jump statement that is used to terminate the loop at a particular iteration.
  • Next Statement: The next statement is used to skip the current iteration in the loop and move to the next iteration without exiting from the loop itself.

R




# R program for break statement
no <- 15:20
 
for (val in no)
{
    if (val == 17)
    {
        break
    }
    print(paste("Values are: ", val))
}
 
print("------------------------------------")
 
# R Next Statement Example
for (val in no)
{
    if (val == 17)
    {
        next
    }
    print(paste("Values are: ", val))
}


 
Output:

[1] "Values are:  15"
[1] "Values are:  16"
[1] "------------------------------------"
[1] "Values are:  15"
[1] "Values are:  16"
[1] "Values are:  18"
[1] "Values are:  19"
[1] "Values are:  20"

Note: For more information, refer Break and Next statements in R

Functions

Functions are the block of code that given the user the ability to reuse the same code which saves the excessive use of memory and provides better readability to the code. So basically, a function is a collection of statements that perform some specific task and return the result to the caller. Functions are created in R by using the command function() keyword

Example:

R




# A simple R program to
# demonstrate functions
 
ask_user = function(x){
    print("GeeksforGeeks")
}
 
my_func = function(x){
    a <- 1:5
    b <- 0
     
    for (i in a){
        b = b +1
    }
    return(b)
}
 
ask_user()
res = my_func()
print(res)


 Output: 

[1] "GeeksforGeeks"
[1] 5

Function with Arguments:

Arguments to a function can be specified at the time of function definition, after the function name, inside the parenthesis.

Example:

R




# A simple R function to check
# whether x is even or odd
 
evenOdd = function(x){
    if(x %% 2 == 0)
         
        # return even if the number
        # is even
        return("even")
    else
         
        # return odd if the number
        # is odd
        return("odd")
}
 
# Function definition
# To check a is divisible by b or not
divisible <- function(a, b){
    if(a %% b == 0)
    {
        cat(a, "is divisible by", b, "\n")
    } else
    {
        cat(a, "is not divisible by", b, "\n")
    }
}
 
# function with single argument
print(evenOdd(4))
print(evenOdd(3))
 
# function with multiple arguments
divisible(7, 3)
divisible(36, 6)
divisible(9, 2)


 Output:

[1] "even"
[1] "odd"
7 is not divisible by 3 
36 is divisible by 6 
9 is not divisible by 2 
  • Default Arguments: Default value in a function is a value that is not required to specify each time the function is called.

Example:

R




# Function definition to check
# a is divisible by b or not.
 
# If b is not provided in function call,
# Then divisibility of a is checked
# with 3 as default
isdivisible <- function(a, b = 9){
    if(a %% b == 0)
    {
        cat(a, "is divisible by", b, "\n")
    } else
    {
        cat(a, "is not divisible by", b, "\n")
    }
}
 
# Function call
isdivisible(20, 2)
isdivisible(12)


 Output:

20 is divisible by 2 
12 is not divisible by 9 
  • Variable length arguments: Dots argument (…) is also known as ellipsis which allows the function to take an undefined number of arguments.

Example:

R




# Function definition of dots operator
fun <- function(n, ...){
    l <- c(n, ...)
    paste(l, collapse = " ")
}
 
# Function call
fun(5, 1L, 6i, TRUE, "GFG", 1:2)


 Output:

5 1 0+6i TRUE GFG 1 2

Refer to the below articles to get detailed information about functions in R

Data Structures

A data structure is a particular way of organizing data in a computer so that it can be used effectively. 

Vectors:

Vectors in R are the same as the arrays in C language which are used to hold multiple data values of the same type. One major key point is that in R the indexing of the vector will start from ‘1’ and not from ‘0’.

 

Vectors-in-R

 

Example:

R




# R program to illustrate Vector
 
# Numeric Vector
N = c(1, 3, 5, 7, 8)
 
# Character vector
C = c('Geeks', 'For', 'Geeks')
 
# Logical Vector
L = c(TRUE, FALSE, FALSE, TRUE)
 
# Printing vectors
print(N)
print(C)
print(L)


 Output:

[1] 1 3 5 7 8
[1] "Geeks" "For"   "Geeks"
[1]  TRUE FALSE FALSE  TRUE

Accessing Vector Elements: 

There are many ways through which we can access the elements of the vector. The most common is using the ‘[]’, symbol.

Example:

R




# Accessing elements using
# the position number.
X <- c(2, 9, 8, 0, 5)
print('using Subscript operator')
print(X[2])
 
# Accessing specific values by passing
# a vector inside another vector.
Y <- c(6, 2, 7, 4, 0)
print('using c function')
print(Y[c(4, 1)])
 
# Logical indexing
Z <- c(1, 6, 9, 4, 6)
print('Logical indexing')
print(Z[Z>3])


 Output:

[1] "using Subscript operator"
[1] 9
[1] "using c function"
[1] 4 6
[1] "Logical indexing"
[1] 6 9 4 6

Refer to the below articles to get detailed information about vectors in R.

Lists:

A list is a generic object consisting of an ordered collection of objects. Lists are heterogeneous data structures.

Example: 

R




# R program to create a List
 
# The first attributes is a numeric vector
# containing the employee IDs which is created
# using the command here
empId = c(1, 2, 3, 4)
 
# The second attribute is the employee name
# which is created using this line of code here
# which is the character vector
empName = c("Nisha", "Nikhil", "Akshu", "Sambha")
 
# The third attribute is the number of employees
# which is a single numeric variable.
numberOfEmp = 4
 
# The fourth attribute is the name of organization
# which is a single character variable.
Organization = "GFG"
 
# We can combine all these three different
# data types into a list
# containing the details of employees
# which can be done using a list command
empList = list(empId, empName, numberOfEmp, Organization)
 
print(empList)


 Output: 

[[1]]
[1] 1 2 3 4

[[2]]
[1] "Nisha"  "Nikhil" "Akshu"  "Sambha"

[[3]]
[1] 4

[[4]]
[1] "GFG"

Accessing List Elements:

  • Access components by names: All the components of a list can be named and we can use those names to access the components of the list using the dollar command.
  • Access components by indices: We can also access the components of the list using indices. To access the top-level components of a list we have to use a double slicing operator “[[ ]]” which is two square brackets and if we want to access the lower or inner level components of a list we have to use another square bracket “[ ]” along with the double slicing operator “[[ ]]“.

Example: 

R




# R program to access
# components of a list
 
# Creating a list by naming all its components
empId = c(1, 2, 3, 4)
empName = c("Nisha", "Nikhil", "Akshu", "Sambha")
numberOfEmp = 4
empList = list(
"ID" = empId,
"Names" = empName,
"Total Staff" = numberOfEmp
)
print("Initial List")
print(empList)
 
# Accessing components by names
cat("\nAccessing name components using $ command\n")
print(empList$Names)
 
# Accessing a top level components by indices
cat("\nAccessing name components using indices\n")
print(empList[[2]])
print(empList[[1]][2])
print(empList[[2]][4])


 Output:

[1] "Initial List"
$ID
[1] 1 2 3 4

$Names
[1] "Nisha"  "Nikhil" "Akshu"  "Sambha"

$`Total Staff`
[1] 4


Accessing name components using $ command
[1] "Nisha"  "Nikhil" "Akshu"  "Sambha"

Accessing name components using indices
[1] "Nisha"  "Nikhil" "Akshu"  "Sambha"
[1] 2
[1] "Sambha"

Adding and Modifying list elements:

  • A list can also be modified by accessing the components and replacing them with the ones which you want.
  • List elements can be added simply by assigning new values using new tags.

Example:

R




# R program to access
# components of a list
 
# Creating a list by naming all its components
empId = c(1, 2, 3, 4)
empName = c("Nisha", "Nikhil", "Akshu", "Sambha")
numberOfEmp = 4
empList = list(
"ID" = empId,
"Names" = empName,
"Total Staff" = numberOfEmp
)
print("Initial List")
print(empList)
 
# Adding new element
empList[["organization"]] <- "GFG"
cat("\nAfter adding new element\n")
print(empList)
 
# Modifying the top-level component
empList$"Total Staff" = 5
   
# Modifying inner level component
empList[[1]][5] = 7
 
cat("\nAfter modification\n")
print(empList)


 Output: 

[1] "Initial List"
$ID
[1] 1 2 3 4

$Names
[1] "Nisha"  "Nikhil" "Akshu"  "Sambha"

$`Total Staff`
[1] 4


After adding new element
$ID
[1] 1 2 3 4

$Names
[1] "Nisha"  "Nikhil" "Akshu"  "Sambha"

$`Total Staff`
[1] 4

$organization
[1] "GFG"


After modification
$ID
[1] 1 2 3 4 7

$Names
[1] "Nisha"  "Nikhil" "Akshu"  "Sambha"

$`Total Staff`
[1] 5

$organization
[1] "GFG"

Refer to the below articles to get detailed information about lists in R

Matrices:

A matrix is a rectangular arrangement of numbers in rows and columns. Matrices are two-dimensional, homogeneous data structures.

Example:

R




# R program to illustrate a matrix
 
A = matrix(
    # Taking sequence of elements
    c(1, 4, 5, 6, 3, 8),
 
    # No of rows and columns
    nrow = 2, ncol = 3,
 
    # By default matrices are
    # in column-wise order
    # So this parameter decides
    # how to arrange the matrix
    byrow = TRUE
)
 
print(A)


 Output:

     [,1] [,2] [,3]
[1,]    1    4    5
[2,]    6    3    8

Accessing Matrix Elements:

Matrix elements can be accessed using the matrix name followed by a square bracket with a comma in between the array. Value before the comma is used to access rows and value that is after the comma is used to access columns.

Example:

R




# R program to illustrate
# access rows in metrics
 
# Create a 3x3 matrix
A = matrix(
c(1, 4, 5, 6, 3, 8),
nrow = 2, ncol = 3,
byrow = TRUE       
)
cat("The 2x3 matrix:\n")
print(A)
 
print(A[1, 1]) 
print(A[2, 2])
 
# Accessing first and second row
cat("Accessing first and second row\n")
print(A[1:2, ])
 
# Accessing first and second column
cat("\nAccessing first and second column\n")
print(A[, 1:2])


 Output:

The 2x3 matrix:
     [,1] [,2] [,3]
[1,]    1    4    5
[2,]    6    3    8
[1] 1
[1] 3
Accessing first and second row
     [,1] [,2] [,3]
[1,]    1    4    5
[2,]    6    3    8

Accessing first and second column
     [,1] [,2]
[1,]    1    4
[2,]    6    3

 Modifying Matrix Elements:

You can modify the elements of the matrices by a direct assignment.

Example:

R




# R program to illustrate
# editing elements in metrics
 
# Create a 3x3 matrix
A = matrix(
    c(1, 4, 5, 6, 3, 8),
    nrow = 2,
    ncol = 3,
    byrow = TRUE
)
cat("The 2x3 matrix:\n")
print(A)
 
# Editing the 3rd rows and 3rd
# column element from 9 to 30
# by direct assignments
A[2, 1] = 30
 
cat("After edited the matrix\n")
print(A)


 Output:

The 2x3 matrix:
     [,1] [,2] [,3]
[1,]    1    4    5
[2,]    6    3    8
After edited the matrix
     [,1] [,2] [,3]
[1,]    1    4    5
[2,]   30    3    8

Refer to the below articles to get detailed information about Matrices in R

DataFrame:

Dataframes are generic data objects of R which are used to store the tabular data. They are two-dimensional, heterogeneous data structures. These are lists of vectors of equal lengths.

Example:

R




# R program to illustrate dataframe
 
# A vector which is a character vector
Name = c("Nisha", "Nikhil", "Raju")
 
# A vector which is a character vector
Language = c("R", "Python", "C")
 
# A vector which is a numeric vector
Age = c(40, 25, 10)
 
# To create dataframe use data.frame command
# and then pass each of the vectors
# we have created as arguments
# to the function data.frame()
df = data.frame(Name, Language, Age)
 
print(df)


 Output:

    Name Language Age
1  Nisha        R  40
2 Nikhil   Python  25
3   Raju        C  10

Getting the structure and data from DataFrame:

  • One can get the structure of the data frame using str() function.
  • One can extract a specific column from a data frame using its column name.

Example:

R




# R program to get the
# structure of the data frame
 
# creating a data frame
friend.data <- data.frame(
    friend_id = c(1:5),
    friend_name = c("Aman", "Nisha",
                    "Nikhil", "Raju",
                    "Raj"),
    stringsAsFactors = FALSE
)
# using str()
print(str(friend.data))
 
# Extracting friend_name column
result <- data.frame(friend.data$friend_name)
print(result)


 
 Output:

'data.frame':    5 obs. of  2 variables:
 $ friend_id  : int  1 2 3 4 5
 $ friend_name: chr  "Aman" "Nisha" "Nikhil" "Raju" ...
NULL
  friend.data.friend_name
1                    Aman
2                   Nisha
3                  Nikhil
4                    Raju
5                     Raj

Summary of dataframe:

The statistical summary and nature of the data can be obtained by applying summary() function.

Example:

R




# R program to get the
# structure of the data frame
 
# creating a data frame
friend.data <- data.frame(
    friend_id = c(1:5),
    friend_name = c("Aman", "Nisha",
                    "Nikhil", "Raju",
                    "Raj"),
    stringsAsFactors = FALSE
)
# using summary()
print(summary(friend.data))


 
 Output:

   friend_id friend_name       
 Min.   :1   Length:5          
 1st Qu.:2   Class :character  
 Median :3   Mode  :character  
 Mean   :3                     
 3rd Qu.:4                     
 Max.   :5                     

Refer to the below articles to get detailed information about DataFrames in R

Arrays:

Arrays are the R data objects which store the data in more than two dimensions. Arrays are n-dimensional data structures.

Example:

R




# R program to illustrate an array
 
A = array(
    # Taking sequence of elements
    c(2, 4, 5, 7, 1, 8, 9, 2),
 
    # Creating two rectangular matrices
    # each with two rows and two columns
    dim = c(2, 2, 2)
)
 
print(A)


 Output:

, , 1

     [,1] [,2]
[1,]    2    5
[2,]    4    7

, , 2

     [,1] [,2]
[1,]    1    9
[2,]    8    2

Accessing arrays:

The arrays can be accessed by using indices for different dimensions separated by commas. Different components can be specified by any combination of elements’ names or positions.

Example:

R




vec1 <- c(2, 4, 5, 7, 1, 8, 9, 2)
vec2 <- c(12, 21, 34)
 
row_names <- c("row1", "row2")
col_names <- c("col1", "col2", "col3")
mat_names <- c("Mat1", "Mat2")
 
arr = array(c(vec1, vec2), dim = c(2, 3, 2),
            dimnames = list(row_names,
                            col_names, mat_names))
 
# accessing matrix 1 by index value
print ("Matrix 1")
print (arr[,,1])
 
# accessing matrix 2 by its name
print ("Matrix 2")
print(arr[,,"Mat2"])
 
# accessing matrix 1 by index value
print ("1st column of matrix 1")
print (arr[, 1, 1])
   
# accessing matrix 2 by its name
print ("2nd row of matrix 2")
print(arr["row2",,"Mat2"])
 
# accessing matrix 1 by index value
print ("2nd row 3rd column matrix 1 element")
print (arr[2, "col3", 1])
   
# accessing matrix 2 by its name
print ("2nd row 1st column element of matrix 2")
print(arr["row2", "col1", "Mat2"])
 
# print elements of both the rows and columns
# 2 and 3 of matrix 1
print (arr[, c(2, 3), 1])


 Output:

[1] "Matrix 1"
     col1 col2 col3
row1    2    5    1
row2    4    7    8
[1] "Matrix 2"
     col1 col2 col3
row1    9   12   34
row2    2   21    2
[1] "1st column of matrix 1"
row1 row2 
   2    4 
[1] "2nd row of matrix 2"
col1 col2 col3 
   2   21    2 
[1] "2nd row 3rd column matrix 1 element"
[1] 8
[1] "2nd row 1st column element of matrix 2"
[1] 2
     col2 col3
row1    5    1
row2    7    8

Adding elements to array:

Elements can be appended at the different positions in the array. The sequence of elements is retained in order of their addition to the array. There are various in-built functions available in R to add new values:

  • c(vector, values)
  • append(vector, values):
  • Using the length function of the array

Example:

R




# creating a uni-dimensional array
x <- c(1, 2, 3, 4, 5)
 
# addition of element using c() function
x <- c(x, 6)
print ("Array after 1st modification ")
print (x)
 
# addition of element using append function
x <- append(x, 7)
print ("Array after 2nd modification ")
print (x)
 
# adding elements after computing the length
len <- length(x)
x[len + 1] <- 8
print ("Array after 3rd modification ")
print (x)
 
# adding on length + 3 index
x[len + 3]<-9
print ("Array after 4th modification ")
print (x)
 
# append a vector of values to the
# array after length + 3 of array
print ("Array after 5th modification")
x <- append(x, c(10, 11, 12), after = length(x)+3)
print (x)
 
# adds new elements after 3rd index
print ("Array after 6th modification")
x <- append(x, c(-1, -1), after = 3)
print (x)


 Output:

[1] "Array after 1st modification "
[1] 1 2 3 4 5 6
[1] "Array after 2nd modification "
[1] 1 2 3 4 5 6 7
[1] "Array after 3rd modification "
[1] 1 2 3 4 5 6 7 8
[1] "Array after 4th modification "
 [1]  1  2  3  4  5  6  7  8 NA  9
[1] "Array after 5th modification"
 [1]  1  2  3  4  5  6  7  8 NA  9 10 11 12
[1] "Array after 6th modification"
 [1]  1  2  3 -1 -1  4  5  6  7  8 NA  9 10 11 12

Removing Elements from Array:

  • Elements can be removed from arrays in R, either one at a time or multiple together. These elements are specified as indexes to the array, wherein the array values satisfying the conditions are retained and rest removed.
  • Another way to remove elements is by using %in% operator wherein the set of element values belonging to the TRUE values of the operator are displayed as result and the rest are removed.

Example:

R




# creating an array of length 9
m <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
print ("Original Array")
print (m)
 
# remove a single value element:3
# from array
m <- m[m != 3]
print ("After 1st modification")
print (m)
 
# removing elements based on condition
# where either element should be
# greater than 2 and less than equal
# to 8
m <- m[m>2 & m<= 8]
print ("After 2nd modification")
print (m)
 
# remove sequence of elements using
# another array
remove <- c(4, 6, 8)
 
# check which element satisfies the
# remove property
print (m % in % remove)
print ("After 3rd modification")
print (m [! m % in % remove])


 Output:

[1] "Original Array"
[1] 1 2 3 4 5 6 7 8 9
[1] "After 1st modification"
[1] 1 2 4 5 6 7 8 9
[1] "After 2nd modification"
[1] 4 5 6 7 8
[1]  TRUE FALSE  TRUE FALSE  TRUE
[1] "After 3rd modification"
[1] 5 7

Refer to the below articles to get detailed information about arrays in R.

Factors:

Factors are the data objects which are used to categorize the data and store it as levels. They are useful for storing categorical data.

Example:

R




# Creating a vector
x<-c("female", "male", "other", "female", "other")
 
# Converting the vector x into
# a factor named gender
gender<-factor(x)
print(gender)


 Output: 

[1] female male   other  female other 
Levels: female male other

Accessing elements of a Factor:

Like we access elements of a vector, the same way we access the elements of a factor
 

Example:

R




x<-c("female", "male", "other", "female", "other")
print(x[3])


 Output:

[1] "other"

Modifying of a Factor:

After a factor is formed, its components can be modified but the new values which need to be assigned must be in the predefined level.

Example:

R




x<-c("female", "male", "other", "female", "other")
x[1]<-"male"
print(x)


Output:

[1] "male"   "male"   "other"  "female" "other" 

Refer to the below articles to get detailed information Factors.

Error Handling

Error Handling is a process in which we deal with unwanted or anomalous errors which may cause abnormal termination of the program during its execution. In R

  • The stop() function will generate errors
  • The stopifnot() function will take a logical expression and if any of the expressions is FALSE then it will generate the error specifying which expression is FALSE.
  • The warning() will create the warning but will not stop the execution.

Error handling can be done using tryCatch(). The first argument of this function is the expression which is followed by the condition specifying how to handle the conditions.

Syntax:

check = tryCatch({
   expression
}, warning = function(w){
   code that handles the warnings
}, error = function(e){
   code that handles the errors
}, finally = function(f){
   clean-up code
})

Example:

R




# R program illustrating error handling
 
# Evaluation of tryCatch
check <- function(expression){
 
  tryCatch(expression,
          
         warning = function(w){
           message("warning:\n", w)
         },
         error = function(e){
           message("error:\n", e)
         },
         finally = {
           message("Completed")
         })
}
 
check({10/2})
check({10/0})
check({10/'noe'})


 
 Output:

try catch finally in R

Refer to the below articles to get detailed information about error handling in R

Charts and Graphs

In a real-world scenario enormous amount of data is produced on daily basis, so, interpreting it can be somewhat hectic. Here data visualization comes into play because it is always better to visualize that data through charts and graphs, to gain meaningful insights instead of screening huge Excel sheets. Let’s see some basic plots in R Programming.

Bar Chart:

R uses the function barplot() to create bar charts. Here, both vertical and Horizontal bars can be drawn.

Example:

R




# Create the data for the chart
A <- c(17, 32, 8, 53, 1)
 
# Plot the bar chart
barplot(A, xlab = "X-axis", ylab = "Y-axis",
        main ="Bar-Chart")


Output:

Bar Chart in R

Note: For more information, refer Bar Charts in R

Histograms:

R creates histogram using hist() function.

Example: 

R




# Create data for the graph.
v <- c(19, 23, 11, 5, 16, 21, 32,
       14, 19, 27, 39)
 
# Create the histogram.
hist(v, xlab = "No.of Articles ",
     col = "green", border = "black")


 
 Output:

r histograms

 

Note: For more information, refer Histograms in R language

Scatter plots:

The simple scatterplot is created using the plot() function.

Example:

R




# Create the data for the chart
A <- c(17, 32, 8, 53, 1)
B <- c(12, 43, 17, 43, 10)
 
 
# Plot the bar chart
plot(x=A, y=B, xlab = "X-axis", ylab = "Y-axis",
        main ="Scatter Plot")


Output:

scatter plot R

 

Note: For more information, refer Scatter plots in R Language

Line Chart:

The plot() function in R is used to create the line graph.

Example:

R




# Create the data for the chart.
v <- c(17, 25, 38, 13, 41)
 
# Plot the bar chart.
plot(v, type = "l", xlab = "X-axis", ylab = "Y-axis",
        main ="Line-Chart")


 
 Output:

line chart R

 

Note: For more information, refer Line Graphs in R Language.

Pie Charts:

R uses the function pie() to create pie charts. It takes positive numbers as a vector input.

Example:

R




# Create data for the graph.
geeks<- c(23, 56, 20, 63)
labels <- c("Mumbai", "Pune", "Chennai", "Bangalore")
 
# Plot the chart.
pie(geeks, labels)


 
 Output:

pie chart R

 

Note: For more information, refer Pie Charts in R Language

Boxplots:

Boxplots are created in R by using the boxplot() function.

R




input <- mtcars[, c('mpg', 'cyl')]
 
# Plot the chart.
boxplot(mpg ~ cyl, data = mtcars,
        xlab = "Number of Cylinders",
        ylab = "Miles Per Gallon",
        main = "Mileage Data")


 
 Output:

 

Note: For more information, refer Boxplots in R Language

For more articles refer Data Visualization using R

Statistics

Statistics simply means numerical data, and is field of math that generally deals with collection of data, tabulation, and interpretation of numerical data. It is an area of applied mathematics concern with data collection analysis, interpretation, and presentation. Statistics deals with how data can be used to solve complex problems.

Mean, Median and Mode:

  • Mean: It is the sum of observation divided by the total number of observations.
  • Median: It is the middle value of the data set.
  • Mode: It is the value that has the highest frequency in the given data set. R does not have a standard in-built function to calculate mode.

Example:

R




# Create the data
A <- c(17, 12, 8, 53, 1, 12,
       43, 17, 43, 10)
 
print(mean(A))
print(median(A))
 
mode <- function(x) {
   a <- unique(x)
   a[which.max(tabulate(match(x, a)))]
}
 
# Calculate the mode using
# the user function.
print(mode(A)


Output:

[1] 21.6
[1] 14.5
[1] 17

Note: For more information, refer Mean, Median and Mode in R Programming

Normal Distribution:

Normal Distribution tells about how the data values are distributed. For example, the height of the population, shoe size, IQ level, rolling a dice, and many more. In R, there are 4 built-in functions to generate normal distribution:
 

  • dnorm() function in R programming measures density function of distribution.
dnorm(x, mean, sd)
  • pnorm() function is the cumulative distribution function which measures the probability that a random number X takes a value less than or equal to x
pnorm(x, mean, sd)
  • qnorm() function is the inverse of pnorm() function. It takes the probability value and gives output which corresponds to the probability value.
qnorm(p, mean, sd)
  • rnorm() function in R programming is used to generate a vector of random numbers which are normally distributed.
rnorm(n, mean, sd)

Example:

R




# creating a sequence of values
# between -10 to 10 with a
# difference of 0.1
x <- seq(-10, 10, by=0.1)
 
 
y = dnorm(x, mean(x), sd(x))
plot(x, y, main='dnorm')
 
y <- pnorm(x, mean(x), sd(x))
plot(x, y, main='pnorm')
 
y <- qnorm(x, mean(x), sd(x))
plot(x, y, main='qnorm')
 
x <- rnorm(x, mean(x), sd(x))
hist(x, breaks=50, main='rnorm')


 
 Output:

pnorm in Rqnorm in Rrnorm in R

 

Note: For more information refer Normal Distribution in R

Binomial Distribution in R Programming:

The binomial distribution is a discrete distribution and has only two outcomes i.e. success or failure. For example, determining whether a particular lottery ticket has won or not, whether a drug is able to cure a person or not, it can be used to determine the number of heads or tails in a finite number of tosses, for analyzing the outcome of a die, etc. We have four functions for handling binomial distribution in R namely:

  • dbinom()
dbinom(k, n, p)
  • pbinom()
pbinom(k, n, p)

where n is total number of trials, p is probability of success, k is the value at which the probability has to be found out.

  • qbinom()
qbinom(P, n, p)

Where P is the probability, n is the total number of trials and p is the probability of success.

  • rbinom()
rbinom(n, N, p)

 Where n is numbers of observations, N is the total number of trials, p is the probability of success.

Example:

R




probabilities <- dbinom(x = c(0:10), size = 10, prob = 1 / 6)
plot(0:10, probabilities, type = "l", main='dbinom')
 
probabilities <- pbinom(0:10, size = 10, prob = 1 / 6)
plot(0:10, , type = "l", main='pbinom')
 
x <- seq(0, 1, by = 0.1)
y <- qbinom(x, size = 13, prob = 1 / 6)
plot(x, y, type = 'l')
 
probabilities <- rbinom(8, size = 13, prob = 1 / 6)
hist(probabilities)


 
Output:

dbinom in Rpbinom in Rqbinom in Rrbinom in R

 

Note: For more information, refer Binomial Distribution in R Programming

Time Series Analysis:

Time Series in R is used to see how an object behaves over a period of time. In R, it can be easily done by ts() function.

Example: Let’s take the example of COVID-19 pandemic situation. Taking total number of positive cases of COVID-19 cases weekly from 22 January, 2020 to 15 April, 2020 of the world in data vector.

R




# Weekly data of COVID-19 positive cases from
# 22 January, 2020 to 15 April, 2020
x <- c(580, 7813, 28266, 59287, 75700,
    87820, 95314, 126214, 218843, 471497,
    936851, 1508725, 2072113)
 
# library required for decimal_date() function
library(lubridate)
 
# creating time series object
# from date 22 January, 2020
mts <- ts(x, start = decimal_date(ymd("2020-01-22")),
                            frequency = 365.25 / 7)
 
# plotting the graph
plot(mts, xlab ="Weekly Data",
        ylab ="Total Positive Cases",
        main ="COVID-19 Pandemic",
        col.main ="darkgreen")


Output:

Time Series Analysis in R

Note: For more information, refer Time Series Analysis in R



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads