Python | Customer Churn Analysis Prediction
It is when an existing customer, user, subscriber, or any kind of return client stops doing business or ends the relationship with a company.
Types of Customer Churn –
- Contractual Churn : When a customer is under a contract for a service and decides to cancel the service e.g. Cable TV, SaaS.
- Voluntary Churn : When a user voluntarily cancels a service e.g. Cellular connection.
- Non-Contractual Churn : When a customer is not under a contract for a service and decides to cancel the service e.g. Consumer Loyalty in retail stores.
- Involuntary Churn : When a churn occurs without any request of the customer e.g. Credit card expiration.
Reasons for Voluntary Churn
- Lack of usage
- Poor service
- Better price
Code: Importing Telco Churn dataset
Exploratory Data Analysis on Telco Churn Dataset
Code : To find the number of churners and non-churners in the dataset:
Code: To group data by Churn and compute the mean to find out if churners make more customer service calls than non-churners:
Yes! Perhaps unsurprisingly, churners seem to make more customer service calls than non-churners.
Code: To find out if one State has more churners compared to another.
While California is the most populous state in the U.S, there are not as many customers from California in our dataset. Arizona (AZ), for example, has 64 customers, 4 of whom ended up churning. In comparison, California has a higher number (and percentage) of customers who churned. This is useful information for a company.
Exploring Data Visualizations : To understand how variables are distributed.
Code: To visualize the difference in Customer service calls between churners and non-churners
It looks like customers who do churn end up leaving more customer service calls unless these customers also have an international plan, in which case they leave fewer customer service calls. This type of information is really useful in better understanding the drivers of churn. It’s now time to learn about how to preprocess your data prior to modelling.
Data Preprocessing for Telco Churn Dataset
Many Machine Learning models make certain assumptions about how the data is distributed. Some of the assumptions are as follows:
- The features are normally distributed
- The features are on the same scale
- The datatypes of features are numeric
In telco churn data, Churn, Voice mail plan, and, International plan, in particular, are binary features that can easily be converted into 0’s and 1’s.
Code: Encoding State feature using One hot encoding
Code : To Create Training and Test sets
Code: To scale features of the training and test sets
Code: To train a Random Forest classifier model on the training set.
Code : Making Predictions
Code: Evaluating Model Performance
Code : Confusion Matrix
From the confusion matrix, we can compute the following metrics:
- True Positives(TP) = 51
- True Negatives(TN) = 575
- False Positives(FP) = 4
- False Negatives(FN) = 37
- Precision = TP/(TP+FP) = 0.92
- Recall = TP/(TP+FN) = 0.57
- Accuracy = (TP+TN)/(TP+TN+FP+FN) = 0.9385
Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.