Decision Threshold In Machine Learning
What is Decision Threshold ?
sklearn does not let us set the decision threshold directly, but it gives us the access to decision scores ( Decision function o/p ) that is used to make the prediction. We can select the best score from decision function output and set it as Decision Threshold value and consider all those Decision score values which are less than this Decision Threshold as a negative class ( 0 ) and all those decision score values that are greater than this Decision Threshold value as a positive class ( 1 ).
Using Precision-Recall curve for various Decision Threshold values, we can select the best value for Decision Threshold such that it gives High Precision ( Without affection Recall much ) or High Recall ( Without affecting Precision much ) based on whether our project is precision-oriented or recall-oriented respectively.
The main purpose of doing this is to get a high precision ML model, or high recall ML model, based on whether our ML project is precision-oriented or recall-oriented respectively.
Code: Python code to build a high Precision ML model
Code: Train the model
Actual Scores obtained:
In the above classification report, we can see that our model precision value for (1) is 0.92 and recall value for (1) is 1.00. Since our goal in this article is to build a High-Precision ML model in predicting (1) without affecting Recall much, we need to manually select the best value of Decision Threshold value form the below Precision-Recall curve, so that we could increase the precision of this model.
Here in the above plot, we can see that if we want high precision value, then we need to increase the value of decision threshold ( x-axis ), but which would decrease the value of recall ( which is not favourable). so we need to choose that value of Decision Threshold which would increase Precision but not much decrease in Recall. One such value form the above plot is around 0.6 Decision Threshold.
Code: Comparison between old and new Precision Values.
old precision value: 0.922077922077922 new precision value: 0.9714285714285714
- The value of Precision has increased from 0.92 to 0.97.
- The Value of Recall has decreased due to Precision-Recall Trade off.
The above code is not data preprocessed( Data Cleaned or Feature Engineered ), which would makes this article prolonged. This is just an idea how to make use of Decision Threshold in practice.
Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.