What is Decision Threshold ?
sklearn does not let us set the decision threshold directly, but it gives us the access to decision scores ( Decision function o/p ) that is used to make the prediction. We can select the best score from decision function output and set it as Decision Threshold value and consider all those Decision score values which are less than this Decision Threshold as a negative class ( 0 ) and all those decision score values that are greater than this Decision Threshold value as a positive class ( 1 ).
Using Precision-Recall curve for various Decision Threshold values, we can select the best value for Decision Threshold such that it gives High Precision ( Without affection Recall much ) or High Recall ( Without affecting Precision much ) based on whether our project is precision-oriented or recall-oriented respectively.
The main purpose of doing this is to get a high precision ML model, or high recall ML model, based on whether our ML project is precision-oriented or recall-oriented respectively.
Code: Python code to build a high Precision ML model
Code: Train the model
Actual Scores obtained:
In the above classification report, we can see that our model precision value for (1) is 0.92 and recall value for (1) is 1.00. Since our goal in this article is to build a High-Precision ML model in predicting (1) without affecting Recall much, we need to manually select the best value of Decision Threshold value form the below Precision-Recall curve, so that we could increase the precision of this model.
Here in the above plot, we can see that if we want high precision value, then we need to increase the value of decision threshold ( x-axis ), but which would decrease the value of recall ( which is not favourable). so we need to choose that value of Decision Threshold which would increase Precision but not much decrease in Recall. One such value form the above plot is around 0.6 Decision Threshold.
Code: Comparison between old and new Precision Values.
old precision value: 0.922077922077922 new precision value: 0.9714285714285714
- The value of Precision has increased from 0.92 to 0.97.
- The Value of Recall has decreased due to Precision-Recall Trade off.
The above code is not data preprocessed( Data Cleaned or Feature Engineered ), which would makes this article prolonged. This is just an idea how to make use of Decision Threshold in practice.
- Learning Model Building in Scikit-learn : A Python Machine Learning Library
- Artificial intelligence vs Machine Learning vs Deep Learning
- How to Start Learning Machine Learning?
- Difference Between Artificial Intelligence vs Machine Learning vs Deep Learning
- Need of Data Structures and Algorithms for Deep Learning and Machine Learning
- Azure Virtual Machine for Machine Learning
- sciPy stats.threshold() function | Python
- Python - Filter above Threshold size Strings
- Python | Threshold Size Greater Strings Frequency
- Wand threshold() function - Python
- Mahotas - Setting Threshold
- Mahotas - Soft Threshold
- Mahotas - Parameter-Free Threshold Adjacency Statistics
- Mahotas - Threshold Adjacency Statistics
- ML | Types of Learning – Supervised Learning
- Introduction to Multi-Task Learning(MTL) for Deep Learning
- Learning to learn Artificial Intelligence | An overview of Meta-Learning
- ML | Reinforcement Learning Algorithm : Python Implementation using Q-learning
- Machine Learning - Applications
- Demystifying Machine Learning
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.