Skip to content
Related Articles

Related Articles

Improve Article
Classification in R Programming
  • Difficulty Level : Expert
  • Last Updated : 10 May, 2020

R is a very dynamic and versatile programming language for data science. This article deals with classification in R. Generally classifiers in R are used to predict specific category related information like reviews or ratings such as good, best or worst.

Various Classifiers are:

  • Decision Trees
  • Naive Bayes Classifiers
  • K-NN Classifiers
  • Support Vector Machines(SVM’s)

Decision Tree Classifier

It is basically is a graph to represent choices. The nodes or vertices in the graph represent an event and the edges of the graph represent the decision conditions. Its common use is in Machine Learning and Data Mining applications.

Spam/Non-spam classification of email, predicting of a tumor is cancerous or not. Usually, a model is constructed with noted data also called training dataset. Then a set of validation data is used to verify and improve the model. R has packages that are used to create and visualize decision trees.

The R package “party” is used to create decision trees.


# Load the party package. It will automatically load other
# dependent packages.
# Create the input data frame.
input.dat <- readingSkills[c(1:105), ]
# Give the chart file a name.
png(file = "decision_tree.png")
# Create the tree.
  output.tree <- ctree(
  nativeSpeaker ~ age + shoeSize + score, 
  data = input.dat)
# Plot the tree.
# Save the file.


null device 
Loading required package: methods
Loading required package: grid
Loading required package: mvtnorm
Loading required package: modeltools
Loading required package: stats4
Loading required package: strucchange
Loading required package: zoo

Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

   as.Date, as.Date.numeric

Loading required package: sandwich

Naive Bayes Classifier

Naïve Bayes classification is a general classification method that uses a probability approach, hence also known as a probabilistic approach based on Bayes’ theorem with the assumption of independence between features. The model is trained on training dataset to make predictions by predict() function.



It is a sample method in machine learning methods but can be useful in some instances. The training is easy and fast that just requires considering each predictor in each class separately.

It is used generally in sentimental analysis.

## Warning: package 'caret' was built under R version 3.4.3
trainIndex = createDataPartition(mydata$prog, p = 0.7)$Resample1
train = mydata[trainIndex, ]
test = mydata[-trainIndex, ]
## check the balance
## academic    general vocational 
## 105         45         50


## Naive Bayes Classifier for Discrete Predictors
## Call:
## naiveBayes.default(x = X, y = Y, laplace = laplace)
## A-priori probabilities:
## Y
##   academic    general vocational 
##  0.5248227  0.2269504  0.2482270 
## Conditional probabilities:
##             science
## Y                [, 1]     [, 2]
##   academic   54.21622 9.360761
##   general    52.18750 8.847954
##   vocational 47.31429 9.969871
##             socst
## Y                [, 1]      [, 2]
##   academic   56.58108  9.635845
##   general    51.12500  8.377196
##   vocational 44.82857 10.279865

K-NN Classifier

Another used classifier is the K-NN classifier. In pattern recognition, the k-nearest neighbor’s algorithm (k-NN) is a non-parametric method generally used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. In k-NN classification, the output is a class membership.
Used in a variety of applications such as economic forecasting, data compression, and genetics.


# Write Python3 code here
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
import seaborn as sns
breast_cancer = load_breast_cancer()
X = pd.DataFrame(, columns = breast_cancer.feature_names)
X = X[['mean area', 'mean compactness']]
y = pd.Categorical.from_codes(, breast_cancer.target_names)
y = pd.get_dummies(y, drop_first = True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 1)
    x ='mean area',
    y ='mean compactness',
    hue ='benign',
    data = X_test.join(y_test, how ='outer')


Support Vector Machines(SVM’s)

A support vector machine (SVM) is a supervised binary machine learning algorithm that uses classification algorithms for two-group classification problems. After giving an SVM model sets of labeled training data for each category, they’re able to categorize new text.

Mainly SVM is used for text classification problems. It classifies the unseen data. It is widely used than Naive Bayes.SVM id usually a fast and dependable classification algorithm that performs very well with a limited amount of data.

SVMs have a number of applications in several fields like Bioinformatics, to classify genes, etc.


# Load the data from the csv file
dataDirectory <- "D:/" # put your own folder here
data <- read.csv(paste(dataDirectory, 'regression.csv', sep =""), header = TRUE)
# Plot the data
plot(data, pch = 16)
# Create a linear regression model
model <- lm(Y ~ X, data)
# Add the fitted line


My Personal Notes arrow_drop_up
Recommended Articles
Page :