Open In App

What is MLOps?

Last Updated : 06 Dec, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

MLOps ( Machine Learning Operations), end-to-end solutions for machine learning. It is a set of practices and tools that combine machine learning (ML) and artificial intelligence (AI) development with operations (Ops) processes. It aims to automate the process of developing, deploying, and maintaining machine learning models.

In this article we will cover Data Version Control using MinIo for data storage we can store and maintain the version of data. Then ML Pipeline Using Kubeflow we write an ML pipeline which will be the complete flow of machine learning. After that GitHub for model versioning, we will use GitHub. Lastly, GitOps using git actions can automate the pipeline.

What is MLOps?

MLOps is a set of practices, guidelines, and tools that unify machine learning system development and operations. MLOps seeks to automate, streamline, and optimize the end-to-end lifecycle. MLOps is all about applying best practices from software development to the machine learning lifecycle, ensuring smoother transitions from experimentation to production and more efficient and robust ML systems.

How MLOps work?

Machine learning consists of a total of four major things – Data Processing, Model Training, Model Inferencing, and Model deployment. To automate this process we are using MLOps. So let’s start step by step:- We know machine learning has two main components, Data and model.

To store the data we can use MinIO or Amazon S3 bucket and in MinIo we also integrate dvc(data version control) so using dvc we can version our data. When the data part is complete we need to write the pipeline which is being used in ML flow. So for this, we can use Kubeflow. In which we can create our volume and we can connect our MinIO server to the kubeflow. It will create a (.dvc) file in kubeflow. We need to write pipeline code in kubeflow volume. The pipeline code will contain the whole process like, it will fetch the latest data from MinIO and preprocess it, train the model and test the model. Kubeflow is a very user-friendly UI so you can visualize and modify your pipeline quite easily. In Kubeflow you can also change hyperparameters in UI and do rapid experiments and you can compare runs from two or more than two experiments. We have kubeflow and MinIO ready and have to automate the process. We can now use GITOps for this. We put all our models in git for model versioning and we can also put data. We know git has limited storage that’s the best part of data versioning, it will create a (.dvc) file which is nothing but the hash value of the data so you can put then to git also. For GITOps, we need to write git actions. Git actions is a (.yaml) file in which you write if any changes happen in the particular branch then the pipeline of kubeflow will automatically trigger and then you can visualize your results in kubeflow. So the basic idea is if any changes happen either in code or data the pipeline automatically triggers. If you want to upload your model to the git again after the pipeline runs you can modify it as per your need.

  • For Data storage and versioning we use MinIo with dvc(data versioning control).
  • For pipeline code and visualization we use Kubeflow.
  • For model versioning we use Git.
  • For automation we use Git actions.

Usage of MLOps

  • For versioning of model and data.
  • Automated model training and deployment.
  • Continuous model monitoring.
  • It will reduce our manual efforts and boost our productivity.

Main components of MLOps

  • Data and model version control.
  • Continuous Integration/Continuous Deployment (CI/CD).
  • Recording details of model training runs, including hyperparameters, performance metrics, and the associated datasets.
  • Packaging models and their dependencies in containers for consistent deployment.
  • Monitoring and optimizing the cost of model training, deployment, and infrastructure.
  • You can set a time in kubeflow, at that time kubeflow pipeline automatically triggers.
  • You can visualize and perform the whole process easily.
  • In MinIO you can also upload data using UI or using code also.
  • You only need (pip install “dvc[s3]”) to install dvc in MinIO.

Why do we need MLOps?

The simple answer to this question is without MLOps we have to do so much manual work such as if any change happens then we manually have to train, infer and push the model to git. But with MLOps, it will done automatically. We also have to maintain an Excel sheet if we want to compare runs but with this, all our experiments and their record come under one platform, Kubeflow. because of this, it encourages comprehensive model documentation, making it easier for teams to understand, maintain, and troubleshoot machine learning systems. It will also help manage sensitive data and ensure regulatory compliance. It optimizes costs by automating resource allocation, scaling, and efficient use of cloud resources during model training and deployment. This tends, MLOps is essential for organizations and teams that leverage machine learning models to make data-driven decisions

Benefits of MLOps

  • Almost every task is automated.
  • Rapid experiments done without going to code you can done using UI.
  • All the experiments under one platform.
  • It is a user-centric approach which aims to improve user experiences by ensuring that models are always up-to-date and perform optimally in production.
  • It allows for the efficient scaling of machine learning models to handle larger datasets and increased workloads.
  • It includes feedback loops to collect user feedback and data for continuous model improvement and retraining.

Difference between MLOps and DevOps

MLOps

DevOps

MLOps is used for machine learning projects. It includes data preparation, model training, testing, deployment and monitoring.

DevOps mainly focused on Development, testing and deployment.

MLOps handles the versioning of data and models.

DevOps didn’t focus on versioning.

In MLOps, the primary artifacts are machine learning models, data pipelines, and feature engineering processes.

In DevOps, the primary things are source code, application binaries, configuration files, and infrastructure as code.

MLOps emphasizes model performance, data drift, and concept drift for monitoring. Involves specific ML metrics.

In DevOps, it will monitor application performance, system metrics, and user experience. Uses traditional IT metrics.

Tools and Technologies are specific ML tools like TensorFlow, PyTorch, scikit-learn, and model serving frameworks.

Tools and Technologies are CI/CD tools like Jenkins, GitLab CI/CD, and container orchestration tools like Kubernetes.

Teams required:- Cross-functional teams may include data scientists, ML engineers, data engineers, and DevOps.

Teams required:- Developers, IT operations, quality assurance, and other stakeholders.

Conclusion

MLOps is very important in machine learning if you have continuous training development then this is the best thing we have. Once the pipeline is created all the tasks will be completely automated you only need to monitor your model and with a user-friendly UI you can easily and efficiently complete your work.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads