Automating the Machine Learning Pipeline for Credit card fraud detection

Before going to the code it is requested to work on a Jupyter notebook or ipython notebook. If not installed on your machine you can use Google Collab.This is one of the best and my personal favorite way of working on a python script to work on a Machine Learning problem
Dataset link:
You can download the dataset from this link
If the link is not working please go to this link and login to Kaggle to download the dataset.

Previous article: Credit Card Fraud Detection using Python

Now, I am considering that you have read the previous article without cheating, so let’s proceed further. In this article, I will be using a library known as Pycaret that does all the heavy lifting for me and let me compare the best models side by side with just a few lines of code, which if you remember the first article took us a hell lot of code and all eternity to compare. We also able to do the most cumbersome job in this galaxy other than maintaining 75% attendance, hyperparameter tuning, that takes days and lots of code in just a couple of minutes with a couple of lines of code. It won’t be wrong if you say that this article will be a short and most effective article you will read in a while. So sit back and relax and let the fun begin.

First install the one most important thing that you will need in this article, Pycaret Library. This library is going to save you a ton of money as you know time is money, right.

To install the lib within your Ipython notebook use –



pip install pycaret

Code: Importing the necessary files

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing all necessary libraries
# linear algebra
import numpy as np 
# data processing, CSV file I / O (e.g. pd.read_csv)
import pandas as pd 

chevron_right


Code: Loading the dataset

filter_none

edit
close

play_arrow

link
brightness_4
code

# Load the dataset from the csv file using pandas 
# best way is to mount the drive on colab and  
# copy the path for the csv file 
path ="credit.csv"
data = pd.read_csv(path) 
data.head()

chevron_right


Code: Knowing the dataset

filter_none

edit
close

play_arrow

link
brightness_4
code

# checking for the imbalance 
len(df[df['Class']== 0])

chevron_right


filter_none

edit
close

play_arrow

link
brightness_4
code

len(df[df['Class']== 1])

chevron_right


Code: Setting up the pycaret classification

filter_none

edit
close

play_arrow

link
brightness_4
code

# Importing module and initializing setup
from pycaret.classification import * clf1 = setup(data = df, target = 'Class')

chevron_right


After this, a confirmation will be required to proceed. Press Enter for moving forward with the code.
Check if all the parameters type is correctly identified by the library.
Tell the classifier the percentage of training and validation split is to be taken. I took 80% training data which is quite common in machine learning.

Coming to the next cell, this is the most important feature of the library. It allows the training data to be fit and compare to all the algorithms in the library to choose the best one. It displays which model is best and in what evaluation matrix. When the data is imbalance accuracy not always tell you the real story. I checked the precision but AUC, F1 and Kappa score can also be of great help to analyze the models. But this is going to an article amongst itself.

Code: Comparing the model



filter_none

edit
close

play_arrow

link
brightness_4
code

# command used for comparing all the models available in the library
compare_models()

chevron_right


Output:

Yellow part is the top score for the corresponding model.

Taking a single algorithm performing decently in the comparison and creating a model for the same. The name of the algorithm can be found in the documentation of the pycaret library under creating model

Code: Creating the best model

filter_none

edit
close

play_arrow

link
brightness_4
code

# creating logistic regression model
ET = create_model('et')

chevron_right


Code: Displaying the model parameters

filter_none

edit
close

play_arrow

link
brightness_4
code

# displaying the model parameters
ET

chevron_right


Output:

Code: Hyperparameter Tuning

filter_none

edit
close

play_arrow

link
brightness_4
code

# hyperparameter tuning for a particular model
model = tune_model('ET')

chevron_right


Output:

Code: Saving the model

After hours and hours of training the model and hyper tuning it, the worst thing that can happen to you is that the model disappears as the session time-out occurs. To save you from this nightmare, let me give a trick you will never forget.

filter_none

edit
close

play_arrow

link
brightness_4
code

# saving the model
save_model(ET, 'ET_saved')

chevron_right


Code: Loading the model

filter_none

edit
close

play_arrow

link
brightness_4
code

# Loading the saved model
ET_saved = load_model('ET_saved')

chevron_right


Output:

Code: Finalizing the Model

A step just before deployment when you merge the train and the validation data and train model on all the data available to you.

filter_none

edit
close

play_arrow

link
brightness_4
code

# finalize a model
final_rf = finalize_model(rf)

chevron_right


Deploying the model is deployed on AWS. For the settings required for the same please visit the documentation

filter_none

edit
close

play_arrow

link
brightness_4
code

# Deploy a model
deploy_model(final_lr, model_name = 'lr_aws', platform = 'aws', authentication = { 'bucket'  : 'pycaret-test' })

chevron_right





My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.