Grid searching is a method to find the best possible combination of hyper-parameters at which the model achieves the highest accuracy. Before applying Grid Searching on any algorithm, Data is used to divided into training and validation set, a validation set is used to validate the models. A model with all possible combinations of hyperparameters is tested on the validation set to choose the best combination.
Implementation:
Grid Searching can be applied to any hyperparameters algorithm whose performance can be improved by tuning hyperparameter. For example, we can apply grid searching on K-Nearest Neighbors by validating its performance on a set of values of K in it. Same thing we can do with Logistic Regression by using a set of values of learning rate to find the best learning rate at which Logistic Regression achieves the best accuracy.
It has 8 features columns like i.e “Age”, “Glucose” e.t.c, and the target variable “Outcome” for 108 patients. So in this, we will train a Logistic Regression Classifier model to predict the presence of diabetes or not for patients with such information.
Code: Implementation of Grid Searching on Logistic Regression from Scratch
Python3
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
class LogitRegression() :
def __init__( self , learning_rate, iterations ) :
self .learning_rate = learning_rate
self .iterations = iterations
def fit( self , X, Y ) :
self .m, self .n = X.shape
self .W = np.zeros( self .n )
self .b = 0
self .X = X
self .Y = Y
for i in range ( self .iterations ) :
self .update_weights()
return self
def update_weights( self ) :
A = 1 / ( 1 + np.exp( - ( self .X.dot( self .W ) + self .b ) ) )
tmp = ( A - self .Y.T )
tmp = np.reshape( tmp, self .m )
dW = np.dot( self .X.T, tmp ) / self .m
db = np. sum ( tmp ) / self .m
self .W = self .W - self .learning_rate * dW
self .b = self .b - self .learning_rate * db
return self
def predict( self , X ) :
Z = 1 / ( 1 + np.exp( - ( X.dot( self .W ) + self .b ) ) )
Y = np.where( Z > 0.5 , 1 , 0 )
return Y
def main() :
df = pd.read_csv( "diabetes.csv" )
X = df.iloc[:,: - 1 ].values
Y = df.iloc[:, - 1 :].values
X_train, X_valid, Y_train, Y_valid = train_test_split(
X, Y, test_size = 1 / 3 , random_state = 0 )
max_accuracy = 0
learning_rates = [ 0.1 , 0.2 , 0.3 , 0.4 , 0.5 ,
0.01 , 0.02 , 0.03 , 0.04 , 0.05 ]
iterations = [ 100 , 200 , 300 , 400 , 500 ]
parameters = []
for i in learning_rates :
for j in iterations :
parameters.append( ( i, j ) )
print ( "Available combinations : " , parameters )
for k in range ( len ( parameters ) ) :
model = LogitRegression( learning_rate = parameters[k][ 0 ],
iterations = parameters[k][ 1 ] )
model.fit( X_train, Y_train )
Y_pred = model.predict( X_valid )
correctly_classified = 0
count = 0
for count in range ( np.size( Y_pred ) ) :
if Y_valid[count] = = Y_pred[count] :
correctly_classified = correctly_classified + 1
curr_accuracy = ( correctly_classified / count ) * 100
if max_accuracy < curr_accuracy :
max_accuracy = curr_accuracy
print ( "Maximum accuracy achieved by our model through grid searching : " , max_accuracy )
if __name__ = = "__main__" :
main()
|
Output:
Available combinations : [(0.1, 100), (0.1, 200), (0.1, 300), (0.1, 400),
(0.1, 500), (0.2, 100), (0.2, 200), (0.2, 300), (0.2, 400), (0.2, 500),
(0.3, 100), (0.3, 200), (0.3, 300), (0.3, 400), (0.3, 500), (0.4, 100),
(0.4, 200), (0.4, 300), (0.4, 400), (0.4, 500), (0.5, 100), (0.5, 200),
(0.5, 300), (0.5, 400), (0.5, 500), (0.01, 100), (0.01, 200), (0.01, 300),
(0.01, 400), (0.01, 500), (0.02, 100), (0.02, 200), (0.02, 300), (0.02, 400),
(0.02, 500), (0.03, 100), (0.03, 200), (0.03, 300), (0.03, 400), (0.03, 500),
(0.04, 100), (0.04, 200), (0.04, 300), (0.04, 400), (0.04, 500), (0.05, 100),
(0.05, 200), (0.05, 300), (0.05, 400), (0.05, 500)]
Maximum accuracy achieved by our model through grid searching : 60.0
In the above, we applied grid searching on all possible combinations of learning rates and the number of iterations to find the peak of the model at which it achieves the highest accuracy.
Code: Implementation of Grid Searching on Logistic Regression of sklearn
Python3
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
def main() :
df = pd.read_csv( "diabetes.csv" )
X = df.iloc[:,: - 1 ].values
Y = df.iloc[:, - 1 :].values
X_train, X_test, Y_train, Y_test = train_test_split(
X, Y, test_size = 1 / 3 , random_state = 0 )
max_accuracy = 0
parameters = { 'C' : [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 ] }
model = LogisticRegression()
grid = GridSearchCV( model, parameters )
grid.fit( X_train, Y_train )
Y_pred = grid.predict( X_test )
correctly_classified = 0
count = 0
for count in range ( np.size( Y_pred ) ) :
if Y_test[count] = = Y_pred[count] :
correctly_classified = correctly_classified + 1
accuracy = ( correctly_classified / count ) * 100
print ( "Maximum accuracy achieved by sklearn model through grid searching : " , np. round ( accuracy, 2 ) )
if __name__ = = "__main__" :
main()
|
Output:
Maximum accuracy achieved by sklearn model through grid searching : 62.86
Note: Grid Searching plays a vital role in tuning hyperparameters for the mathematically complex models.
Share your thoughts in the comments
Please Login to comment...