Sklearn | Model Hyper-parameters Tuning
Last Updated :
16 Oct, 2023
Hyperparameter tuning is the process of finding the optimal values for the hyperparameters of a machine-learning model. Hyperparameters are parameters that control the behaviour of the model but are not learned during training. Hyperparameter tuning is an important step in developing machine learning models because it can significantly improve the model’s performance on new data. However, hyperparameter tuning can be a time-consuming and challenging task. Scikit-learn provides several tools that can help you tune the hyperparameters of your machine-learning models. In this guide, we will provide a comprehensive overview of hyperparameter tuning in Scikit-learn.
What are hyperparameters?
Hyperparameters are parameters that control the behaviour of a machine-learning model but are not learned during training. Some common examples of hyperparameters include:
- Regularization strength: This parameter controls how much the model is penalized for overfitting.
- Number of trees: This parameter controls the number of trees in a random forest model.
- Learning rate: This parameter controls how quickly the model learns during training.
Why is hyperparameter tuning important?
Tuning hyperparameters is important because it can improve the performance of a training model on new data. For example, a poorly calibrated model will have high bias, meaning it is unsuitable for new data. On the other hand, a well-calibrated model will have bias and high variance, meaning it will extend well to new data and be accurate.
How to tune hyperparameters in Scikit-learn:
Scikit-Learn provides a variety of tools to help you tune the hyperparameters of your machine-learning models. A popular method is to use grid search.
GridSearch CV : Grid search is a brute force method that iterates through all possible combinations of hyperparameter values. You can implement grid search in scikit-learn using the GridSearchCV class. The GridSearchCV class defines a machine learning model and hyperparameter search space. A hyperparameter search space is a dictionary that defines the range of values ​​for each hyperparameter. The model is then evaluated on the delayed validation dataset. The combination of hyperparameters that best fit the data used was selected as the optimal model.
Another popular way to tune hyperparameters is to use random search.
Random Search : Compared to grid search, random search is a cheaper method because it tests only a random sample of hyperparameter values. You can implement random search in sci-kit-learn using the RandomizedSearchCV class. The RandomizedSearchCV class takes a machine-learning model and a hyperparameter distribution as input. A hyperparameter distribution is a dictionary that defines the distribution of values ​​to be tested for each hyperparameter. In the RandomizedSearchCV lecture, we train a machine learning program to randomly check hyperparameter values ​​in hyperparameter passes.
At this point, the demo is evaluated based on the delayed assertion data set. The combination of hyperparameters that achieves the best performance on the assertion dataset is selected as the key metric.
Advanced hyperparameter tuning techniques
In addition to grid search and random search, there are several other advanced hyperparameter tuning techniques that you can use in Scikit-learn. These techniques include:
- Bayesian optimization: Bayesian optimization is a sequential model-based optimization technique that can be used to search for the optimal hyperparameter values efficiently.
- Hyperband: Hyperband is a resource-efficient algorithm for hyperparameter tuning.
- Tree-structured Parzen estimator (TPE): TPE is a sequential model-based optimization technique often used to tune the hyperparameters of tree-based models.
Drawback of gridsearch cv:
- Computationally expensive: GridSearchCV searches for all combinations of hyperparameters in the grid. Therefore, it can be considered expensive, especially when the search area is large or samples are used.
- Comprehensive Search: GridSearchCV performs a comprehensive search on the grid parameter. This means that it evaluates all connections, even if some of them do not appear to improve performance standards. This may cause data loss.
- Not effective for large search space: When dealing with large search space or large number of hyperparameters, GridSearchCV does not work to scale due to large number of connections.
- Limited Exploration: GridSearchCV may not be able to explore the hyperparameter space like other search methods (such as random search). It does not provide much randomness in the search process and the hyperparameter space may not have an expectation space.
- Scalability Issues: GridSearchCV may not work well with some machine learning algorithms and large datasets. This may be impossible when dealing with big data.
- Will not change the results: GridSearchCV does not update its search based on the results of previous tests. It does not learn from the performance of previous hyperparameter combinations and may waste time on similar combinations or not match.
- Limited parallelization: GridSearchCV can be parallelized to some extent, but not all connections can be calculated at the same time. This limits its performance on multi-core processors or distributed computing environments.
- Does not solve the problem of model selection: GridSearchCV only focuses on hyperparameter modification and does not solve the problem of choosing different models or algorithms. Model selection often involves choosing from different types of machine learning, which GridSearchCV does not always support.
SVC Algorithm
GridSearchCV
Python3
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.2 , random_state = 42 )
param_grid = {
'C' : [ 0.1 , 1 , 10 ],
'kernel' : [ 'linear' , 'rbf' , 'poly' ],
'gamma' : [ 0.1 , 1 , 'scale' , 'auto' ],
}
svm = SVC()
grid_search = GridSearchCV(
estimator = svm, param_grid = param_grid, cv = 5 , n_jobs = - 1 )
grid_search.fit(X_train, y_train)
print ( "Best Hyperparameters: " , grid_search.best_params_)
print ( "Best Accuracy Score: {:.2f}%" . format (grid_search.best_score_ * 100 ))
best_svm = grid_search.best_estimator_
test_accuracy = best_svm.score(X_test, y_test)
print ( "Test Accuracy: {:.2f}%" . format (test_accuracy * 100 ))
|
Output:
Best Hyperparameters: {'C': 0.1, 'gamma': 0.1, 'kernel': 'poly'}
Best Accuracy Score: 95.83%
Test Accuracy: 100.00%
- The output will display the best hyperparameters found during the grid search and the corresponding cross-validation accuracy score.
- It will also show the accuracy of the best model on the test set.
- The code is essentially performing hyperparameter optimization to find the best SVM model for the Iris dataset, and it reports the performance of the best model on unseen data.
Random search
Python3
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from scipy.stats import uniform, expon
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.2 , random_state = 42 )
param_grid = {
'C' : [ 0.1 , 1 , 10 ],
'kernel' : [ 'linear' , 'rbf' , 'poly' ],
'gamma' : [ 0.1 , 1 , 'scale' , 'auto' ],
}
param_dist = {
'C' : uniform( 0.1 , 10 ),
'kernel' : [ 'linear' , 'rbf' , 'poly' ],
'gamma' : expon(scale = 1 ),
}
svm = SVC()
grid_search = GridSearchCV(
estimator = svm, param_grid = param_grid, cv = 5 , n_jobs = - 1 )
random_search = RandomizedSearchCV(
estimator = svm, param_distributions = param_dist, n_iter = 50 , cv = 5 , n_jobs = - 1 )
grid_search.fit(X_train, y_train)
random_search.fit(X_train, y_train)
print ( "Grid Search - Best Hyperparameters: " , grid_search.best_params_)
print ( "Grid Search - Best Accuracy Score: {:.2f}%" . format (grid_search.best_score_ * 100 ))
print ( "Random Search - Best Hyperparameters: " , random_search.best_params_)
print ( "Random Search - Best Accuracy Score: {:.2f}%" . format (random_search.best_score_ * 100 ))
best_svm_grid = grid_search.best_estimator_
best_svm_random = random_search.best_estimator_
test_accuracy_grid = best_svm_grid.score(X_test, y_test)
test_accuracy_random = best_svm_random.score(X_test, y_test)
print ( "Test Accuracy (Grid Search): {:.2f}%" . format (test_accuracy_grid * 100 ))
print ( "Test Accuracy (Random Search): {:.2f}%" . format (test_accuracy_random * 100 ))
|
Output:
Grid Search - Best Hyperparameters: {'C': 0.1, 'gamma': 0.1, 'kernel': 'poly'}
Grid Search - Best Accuracy Score: 95.83%
Random Search - Best Hyperparameters: {'C': 3.900736564361965, 'gamma': 0.4094567581571069, 'kernel': 'linear'}
Random Search - Best Accuracy Score: 96.67%
Test Accuracy (Grid Search): 100.00%
Test Accuracy (Random Search): 96.67%
The output will display the best hyperparameters found during grid search and random search, along with their corresponding cross-validation accuracy scores.
It will also show the accuracy of the best models found by both methods on the test set.
You can compare the performance of grid search and random search in finding the best hyperparameters for the SVM classifier.
XGBoost algorithm
GridSearchCV
Python3
import xgboost as xgb
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2 , random_state = 42 )
param_grid = {
'n_estimators' : [ 100 , 200 , 300 ],
'learning_rate' : [ 0.01 , 0.1 , 0.2 ],
'max_depth' : [ 3 , 4 , 5 ],
'min_child_weight' : [ 1 , 3 , 5 ],
'subsample' : [ 0.8 , 0.9 , 1.0 ],
'colsample_bytree' : [ 0.8 , 0.9 , 1.0 ]
}
xgb_model = xgb.XGBClassifier()
grid_search = GridSearchCV(xgb_model, param_grid, cv = 5 , scoring = 'accuracy' )
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_
best_model.fit(X_train, y_train)
accuracy = best_model.score(X_test, y_test)
print (f "Best Hyperparameters: {best_params}" )
print (f "Accuracy on test set: {accuracy:.2f}" )
|
Output:
Best Hyperparameters: {'colsample_bytree': 1.0, 'learning_rate': 0.01, 'max_depth': 3, 'min_child_weight': 1, 'n_estimators': 200, 'subsample': 1.0}
Accuracy on test set: 1.00
In this output:
- The best hyperparameters found by the grid search are listed.
- The accuracy on the test set is also reported, indicating how well the best model performs on unseen data.
- The goal of this code is to find the best hyperparameters for an XGBoost classifier and evaluate its performance on the test set
Random search
Python3
import xgboost as xgb
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2 , random_state = 42 )
param_dist = {
'n_estimators' : [ 100 , 200 , 300 , 400 , 500 ],
'learning_rate' : [ 0.01 , 0.1 , 0.2 , 0.3 , 0.4 ],
'max_depth' : [ 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 ],
'min_child_weight' : [ 1 , 3 , 5 , 7 , 9 ],
'subsample' : [ 0.8 , 0.9 , 1.0 ],
'colsample_bytree' : [ 0.6 , 0.7 , 0.8 , 0.9 , 1.0 ],
'gamma' : [ 0 , 0.1 , 0.2 , 0.3 , 0.4 ],
'lambda' : [ 0 , 0.1 , 0.2 , 0.3 , 0.4 ]
}
xgb_model = xgb.XGBClassifier()
random_search = RandomizedSearchCV(xgb_model, param_distributions = param_dist, n_iter = 100 , cv = 5 , scoring = 'accuracy' , random_state = 42 )
random_search.fit(X_train, y_train)
best_params = random_search.best_params_
best_model = random_search.best_estimator_
best_model.fit(X_train, y_train)
accuracy = best_model.score(X_test, y_test)
print (f "Best Hyperparameters: {best_params}" )
print (f "Accuracy on test set: {accuracy:.2f}" )
|
Output:
Best Hyperparameters: {'subsample': 0.8, 'n_estimators': 200, 'min_child_weight': 1, 'max_depth': 7, 'learning_rate': 0.01, 'lambda': 0.3, 'gamma': 0.3, 'colsample_bytree': 0.9}
Accuracy on test set: 1.00
In this output:
- The best hyperparameters found by the random search are listed.
- The accuracy on the test set is also reported, indicating how well the best model performs on unseen data.
- Randomized search is a more efficient way to explore hyperparameter space compared to grid search, especially when there are a large number of hyperparameters to consider.
Logistic regression algorithm
GridSearchCV
Python3
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
import warnings
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2 , random_state = 42 )
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
param_grid = {
'C' : [ 0.001 , 0.01 , 0.1 , 1 , 10 , 100 ],
'penalty' : [ 'l2' ],
'solver' : [ 'liblinear' , 'lbfgs' ]
}
logistic_regression = LogisticRegression(max_iter = 1000 )
with warnings.catch_warnings():
warnings.filterwarnings( "ignore" , category = UserWarning)
grid_search = GridSearchCV(logistic_regression, param_grid, cv = 5 , scoring = 'accuracy' )
grid_search.fit(X_train_scaled, y_train)
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_
best_model.fit(X_train_scaled, y_train)
accuracy = best_model.score(X_test_scaled, y_test)
print (f "Best Hyperparameters: {best_params}" )
print (f "Accuracy on test set: {accuracy:.2f}" )
|
Output:
Best Hyperparameters: {'C': 1, 'penalty': 'l2', 'solver': 'lbfgs'}
Accuracy on test set: 1.00
In this code:
- The best hyperparameters are reported, including ‘C’, ‘penalty’, and ‘solver’.
- The accuracy on the test set indicates how well the logistic regression model with the best hyperparameters performs on unseen data. In this case, it achieves an accuracy of 0.97 (97%).
Random search
Python3
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
import numpy as np
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2 , random_state = 42 )
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
param_dist = {
'C' : np.logspace( - 4 , 4 , 100 ),
'penalty' : [ 'l2' ],
'solver' : [ 'lbfgs' ]
}
logistic_regression = LogisticRegression(max_iter = 1000 )
random_search = RandomizedSearchCV(logistic_regression, param_distributions = param_dist, n_iter = 100 , cv = 5 , scoring = 'accuracy' , random_state = 42 , error_score = 'raise' )
random_search.fit(X_train_scaled, y_train)
best_params = random_search.best_params_
best_model = random_search.best_estimator_
best_model.fit(X_train_scaled, y_train)
accuracy = best_model.score(X_test_scaled, y_test)
print (f "Best Hyperparameters: {best_params}" )
print (f "Accuracy on test set: {accuracy:.2f}" )
|
Output:
Best Hyperparameters: {'solver': 'lbfgs', 'penalty': 'l2', 'C': 0.6280291441834259}
Accuracy on test set: 1.00
In this code:
- The best hyperparameters are reported, including ‘C’, ‘penalty’, and ‘solver’.
- The accuracy on the test set indicates how well the logistic regression model with the best hyperparameters performs on unseen data. In this case, it achieves an accuracy of 0.97 (97%).
Conclusion
Hyperparameter tuning is an imperative step in machine learning show improvement. Tuning hyperparameters can essentially make strides demonstrate execution on modern information. Scikit-learn gives a few devices to assist you tune the hyperparameters of your machine learning demonstrate.
Share your thoughts in the comments
Please Login to comment...