Skip to content
Related Articles

Related Articles

ML | Logistic Regression v/s Decision Tree Classification
  • Last Updated : 23 May, 2019

Logistic Regression and Decision Tree classification are two of the most popular and basic classification algorithms being used today. None of the algorithms is better than the other and one’s superior performance is often credited to the nature of the data being worked upon.

We can compare the two algorithms on different categories –

CriteriaLogistic RegressionDecision Tree Classification
InterpretabilityLess interpretableMore interpretable
Decision BoundariesLinear and single decision boundaryBisects the space into smaller spaces
Ease of Decision MakingA decision threshold has to be setAutomatically handles decision makingOverfittingNot prone to overfittingProne to overfitting
Robustness to noiseRobust to noiseMajorly affected by noise
ScalabilityRequires a large enough training setCan be trained on a small training set

As a simple experiment, we run the two models on the same dataset and compare their performances.

Step 1: Importing the required libraries

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier

Step 2: Reading and cleaning the Dataset

cd C:\Users\Dev\Desktop\Kaggle\Sinking Titanic
# Changing the working location to the location of the file 
df = pd.read_csv('_train.csv')
y = df['Survived']
X = df.drop('Survived', axis = 1)
X = X.drop(['Name', 'Ticket', 'Cabin', 'Embarked'], axis = 1)
X = X.replace(['male', 'female'], [2, 3])
# Hot-encoding the categorical variables
X.fillna(method ='ffill', inplace = True)
# Handling the missing values

Step 3: Training and evaluating the Logisitc Regression model

X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size = 0.3, random_state = 0)
lr = LogisticRegression(), y_train)
print(lr.score(X_test, y_test))

Step 4: Training and evaluating the Decision Tree Classifier model

criteria = ['gini', 'entropy']
scores = {}
for c in criteria:
    dt = DecisionTreeClassifier(criterion = c), y_train)
    test_score = dt.score(X_test, y_test)
    scores = test_score

On comparing the scores, we can see that the logistic regression model performed better on the current dataset but this might not be the case always.

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course

My Personal Notes arrow_drop_up
Recommended Articles
Page :