Classification is the task in which objects of several categories are categorized into their respective classes using the properties of classes. A classification model is typically used to,
- Predict the class label for a new unlabeled data object
- Provide a descriptive model explaining what features characterize objects in each class
There are various types of classification techniques such as,
- Logistic Regression
- Decision Tree
- K-Nearest Neighbours
- Naive Bayes Classifier
- Support Vector Machines (SVM)
- Random Forest Classification
Decision Tree Classifiers
A decision tree is a flowchart-like tree structure in which the internal node represents feature(or attribute), the branch represents a decision rule, and each leaf node represents the outcome. A Decision Tree consists of,
- Nodes: Test for the value of a certain attribute.
- Edges/Branch: Represents a decision rule and connect to the next node.
- Leaf nodes: Terminal nodes that represent class labels or class distribution.
And this algorithm can easily be implemented in the R language. Some important point about decision tree classifiers are,
- It is more interpretable
- Automatically handles decision-making
- Bisects the space into smaller spaces
- Prone to overfitting
- Can be trained on a small training set
- Majorly affected by noise
Implementation in R
A sample population of 400 people shared their age, gender, and salary with a product company, and if they bought the product or not(0 means no, 1 means yes). Download the dataset Advertisement.csv.
User ID Gender Age EstimatedSalary Purchased 0 15624510 Male 19 19000 0 1 15810944 Male 35 20000 0 2 15668575 Female 26 43000 0 3 15603246 Female 27 57000 0 4 15804002 Male 19 76000 0 5 15728773 Male 27 58000 0 6 15598044 Female 27 84000 0 7 15694829 Female 32 150000 1 8 15600575 Male 25 33000 0 9 15727311 Female 35 65000 0
- The training set contains 300 entries.
- The test set contains 100 entries.
Confusion Matrix: [[62, 6], [ 3, 29]]
Visualizing the Train Data:
Visualizing the Test Data:
Decision Tree Diagram: