Setting up Environment for Machine Learning with R Programming

Machine Learning is a subset of Artificial Intelligence (AI), which is used to create intelligent systems that are able to learn without being programmed explicitly. In machine learning, we create algorithms and models which is used by an intelligent system to predict outcomes based on particular patterns or trends which are observed from the given data. Machine learning follows a unique principle of using data and the outcomes from the data to predict the rules which are stored in a model. This model is then used to predict outcomes from a different set of data. In R programming the environment for machine learning can be set easily through RStudio.

Setting up an environment for machine learning using Anaconda

Step 1: Install Anaconda (Linux, Windows) and launch the navigator.

Step 2: Open Anaconda Navigator and click the Install button for Rstudio.
anaconda-navigator

Step 3: After installation, create a new environment. Anaconda will then send a prompt asking to enter a name for the new environment and the lunch the R studio.
create-new-environment

Running R commands

Method 1: R commands can run from the console provided in R studio. After opening Rstudio simply type R commands to the console.
running-R-command



Method 2: R commands can be stored in a file and can be executed in an anaconda prompt. This can be achieved by the following steps.

  1. Open an anaconda prompt
  2. Go to the directory where the R file is located
  3. Activate the anaconda environment by using the command:
    conda activate <ENVIRONMENT_NAME>
  4. Run the file by using the command:
     Rscript <FILE_NAME>.R

running-R-command-in-anaconda-console

Installing machine learning packages in R

Packages help make code easier to write as they contain a set of predefined functions that perform various tasks. The most used machine learning packages are Caret, e1071, net, kernlab, and randomforest. There are two methods that can be used to install these packages for your R program.

Method 1: Installing Packages through Rstudio

  1. Open Rstudio and click the Install Packages option under Tools which is present in the menubar.
    install-packages
  2. Enter the names of all the packages you want to install separated by spaces or commas and then click install.
    install-packages
  3. Method 2: Installing Packages through Anaconda prompt/Rstudio console

    1. Open an Anaconda prompt.
    2. Switch the environment to the environment you used for Rstudio using the command:
      conda activate <ENVIRONMENT_NAME>
    3. Enter the command r to open the R console.
    4. Install the required packages using the command:
      install.packages(c("<PACKAGE_1>", "<PACKAGE_2>", ..., "<PACKAGE_N>"))

    installing-R-packages

    While downloading the packages you might be prompted to choose a CRAN mirror. It is recommended to choose the location closest to you for a faster download.
    ML with R

    Machine Learning packages in R

    There are many R libraries that contain a host of functions, tools, and methods to manage and analyze data. Each of these libraries has a particular focus with some libraries managing image and textual data, data manipulation, data visualization, web crawling, machine learning, and so on. Here let’s discuss some of the important machine learning packages by demonstrating an example.



    Example:

    Preparing the Data Set:
    Before using these packages first of all import the data set into RStudio, cleaning the data set, and split the data into train and test data set. Download the CSV file from this link.

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # Import the data set
    Data <- read.csv("GenderClassification.csv",
                      stringsAsFactors = TRUE)
    # Using set.seed()
    # Generating random number
    set.seed(10)
      
    # Cleaning the data set
    Data$Favorite.Color <- as.numeric
                              (Data$Favorite.Color)
    Data$Favorite.Music.Genre <- as.numeric
                              (Data$Favorite.Music.Genre)
    Data$Favorite.Beverage <- as.numeric
                              (Data$Favorite.Beverage)
    Data$Favorite.Soft.Drink <- as.numeric
                              (Data$Favorite.Soft.Drink)
      
    # Split into train and test data set
    TrainingSize <- createDataPartition(Data$Gender, 
                                        p = 0.8
                                        list = FALSE)
    TrainingData <- Data[TrainingSize,]
    TestingData <- Data[-TrainingSize,]

    chevron_right

    
    

    CARET: Caret stands for classification and regression training. The CARET package is used for performing classification and for regression tasks. It consists of many other built-in packages.

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # Using CARET package
      
    # Importing the library
    library(caret)
      
    # Using the train() available in
    # Caret package
    model <- train(Gender ~ ., data = TrainingData, 
                   method = "svmPoly",
                   na.action = na.omit,
                   preProcess = c("scale", "center"),
                   trControl = trainControl(method = "none"),
                   tuneGrid = data.frame(degree = 1, 
                                         scale = 1, 
                                         C = 1)
    )
    model.cv <- train(Gender ~ ., data = TrainingData,
                      method = "svmPoly",
                      na.action = na.omit,
                      preProcess = c("scale", "center"),
                      trControl = trainControl(method = "cv"
                                               number = 6),
                      tuneGrid = data.frame(degree = 1, 
                                            scale = 1,
                                            C = 1)
    )
      
    # Printing the models
    print(model)
    print(model.cv)

    chevron_right

    
    

    Output:
    output

    ggplot2: R is most famous for its visualization library ggplot2. It provides an aesthetic set of graphics that are also interactive. The ggplot2 package is used for creating plots and for visualising data.

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # Using ggplot2
      
    # Creating a bar plot from the 
    # Data's Favorite.Color attribute
    ggplot(Data, aes(Favorite.Color)) +
      geom_bar(fill = "#0073C2FF")

    chevron_right

    
    

    Output:
    output-graph

    randomForest: The randomForest package allows us to use the random forest algorithm easily.

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # Using randomforset
      
    # Importing the randomForest package
    library(randomForest)
      
    # Using the randomForest function 
    # From the randomForest package
    model <- randomForest(formula = Gender ~ ., 
                          data = Data)
    print(model)

    chevron_right

    
    

    Output:
    output screen

    nnet: The nnet package uses neural networks in deep learning to create layers which help in training and predicting models. The loss (the difference between the actual value and predicted value) decreases after every iteration of training.

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # Using nnet
      
    # Importing the nnet package
    library(nnet)
      
    # Using the nnet function
    # In the nnet package 
    model <- nnet(formula = Gender ~ ., 
                  data = Data, 
                  size = 30)
    print(model)

    chevron_right

    
    

    Output:
    output screen

    e1071: The e1071 package is used to implement the support vector machines, naive bayes algorithm and many other algorithms.

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # Using e1071
      
    # Importing the e1071 package
    library(e1071)
      
    # Using the svm function 
    # In the e1071 package
    model <- svm(formula = Gender ~ ., 
                 data = Data)
    print(model)

    chevron_right

    
    

    Output:
    output-screen

    rpart: The rpart package is used to partition data. It is used for classification and regression tasks. The resultant model is in the form of a binary tree.

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # Using rpart
      
    # Importing the rpart package
    library(rpart)
      
    # Using the rpart function
    # To partition data
    partition <- rpart(formula = Gender~., 
                       data = Data)
    plot(partition)

    chevron_right

    
    

    Output:
    output-screen

    dplyr: Like rpart the dplyr package is also a data manipulation package. It helps manipulate data by using functions such as filter, select, and arrange.

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # Using dplyr
      
    # Importing the dplyr package
    library(dplyr)
      
    # Using the filter function
    # From the dplyr package 
    Data %>% 
      filter(Gender == "M")

    chevron_right

    
    

    Output:
    output-screen




    My Personal Notes arrow_drop_up

    Check out this Author's contributed articles.

    If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

    Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


    Article Tags :

    Be the First to upvote.


    Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.