In machine learning, while working with scikit learn library, we need to save the trained models in a file and restore them in order to reuse it to compare the model with other models, to test the model on a new data. The saving of data is called Serializaion, while restoring the data is called Deserialization.
Also, we deal with different types and sizes of data. Some datasets are easily trained i.e- they take less time to train but the datasets whose size is large (more than 1GB) can take very large time to train on a local machine even with GPU. When we need the same trained data in some different project or later sometime, to avoid the wastage of the training time, store trained model so that it can be used anytime in the future.
There are two ways we can save a model in scikit learn:
- Pickle string: The pickle module implements a fundamental, but powerful algorithm for serializing and de-serializing a Python object structure.
Pickle model provides the following functions –
pickle.dumpto serialize an object hierarchy, you simply use dump().
pickle.loadto deserialize a data stream, you call the loads() function.
Example: Let’s apply K Nearest Neighbor on iris dataset and then save the model.
numpy as np
# Load dataset
# Split dataset into train and test
X_train, X_test, y_train, y_test
train_test_split(X, y, test_size
# import KNeighborsClassifier model
KNeighborsClassifier as KNN
# train model
Save model to string using pickle –
# Save the trained model as a pickle string.
# Load the pickled model
# Use the loaded pickled model to make predictions
- Pickled model as a file using joblib: Joblib is the replacement of pickle as it is more efficent on objects that carry large numpy arrays. These functions also accept file-like object instead of filenames.
joblib.dumpto serialize an object hierarchy
joblib.loadto deserialize a data stream
Save to pickled file using joblib –
# Save the model as a pickle in a file
# Load the model from the file
# Use the loaded model to make predictions
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.
- ML - Saving a Deep Learning model in Keras
- Learning Model Building in Scikit-learn : A Python Machine Learning Library
- Artificial intelligence vs Machine Learning vs Deep Learning
- How to Start Learning Machine Learning?
- Difference Between Artificial Intelligence vs Machine Learning vs Deep Learning
- Need of Data Structures and Algorithms for Deep Learning and Machine Learning
- Azure Virtual Machine for Machine Learning
- Creating a simple machine learning model
- seq2seq model in Machine Learning
- Deploy Machine Learning Model using Flask
- Using Google Cloud Function to generate data for Machine Learning model
- Machine Learning Computing at the edge using model artifacts
- Deploy a Machine Learning Model using Streamlit Library
- Python - Create UIs for prototyping Machine Learning model with Gradio
- Metrics for Machine learning model
- OpenCV | Saving an Image
- ML | Types of Learning – Supervised Learning
- Introduction to Multi-Task Learning(MTL) for Deep Learning
- Learning to learn Artificial Intelligence | An overview of Meta-Learning
- ML | Reinforcement Learning Algorithm : Python Implementation using Q-learning
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.