KNN is a machine learning algorithm which is used for both classification (using KNearestClassifier) and Regression (using KNearestRegressor) problems.In KNN algorithm K is the Hyperparameter. Choosing the right value of K matters. A machine learning model is said to have high model complexity if the built model is having low Bias and High Variance.
We know that,
- High Bias and Low Variance = Under-fitting model.
- Low Bias and High Variance = Over-fitting model. [Indicated highly complex model ].
- Low Bias and Low Variance = Best fitting model. [This is preferred ].
- High training accuracy and Low test accuracy ( out of sample accuracy ) = High Variance = Over-fitting model = More model complexity.
- Low training accuracy and Low test accuracy ( out of sample accuracy ) = High Bias = Under-fitting model.
Code: To understand how K value in KNN algorithm affects the model complexity.
Test Accuracy: 0.6465919540035108 Training Accuracy: 0.8687977824212627
Now let’s vary the value of K (Hyperparameter) from Low to High and observe the model complexity
K = 1
K = 10
K = 20
K = 50
K = 70
- When K value is small i.e. K=1, The model complexity is high ( Over-fitting or High Variance).
- When K value is very large i.e. K=70, The model complexity decreases ( Under-fitting or High Bias ).
As K value becomes small model complexity increases and as K value becomes large the model complexity decreases.
Code: Let’s consider the below plot
From the above graph, we can conclude that when K is small i.e. K=1, Training Accuracy is High but Test Accuracy is Low which means the model is over-fitting ( High Variance or High Model Complexity). When the value of K is large i.e. K=50, Training Accuracy is Low as well as Test Accuracy is Low which means the model is under-fitting ( High Bias or Low Model Complexity ).
So Hyperparameter tuning is necessary i.e. to select the best value of K in KNN algorithm for which the model has Low Bias and Low Variance and results in a good model with high out of sample accuracy.
We can use GridSearchCV or RandomSearchCv to find the best value of hyper parameter K.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.
- ML | Implementation of KNN classifier using Sklearn
- ML | Kaggle Breast Cancer Wisconsin Diagnosis using KNN and Cross Validation
- IBM HR Analytics Employee Attrition & Performance using KNN
- Implementation of KNN using OpenCV
- Introductory guide to Information Retrieval using KNN and KDTree
- Python Code for time Complexity plot of Heap Sort
- Complexity Cheat Sheet for Python Operations
- Learning Model Building in Scikit-learn : A Python Machine Learning Library
- Gaussian Mixture Model
- Creating a simple machine learning model
- tf-idf Model for Page Ranking
- Saving a machine learning Model
- seq2seq model in Machine Learning
- Implement your own word2vec(skip-gram) model in Python
- Add the slug field inside Django Model
- Bag of words (BoW) model in NLP
- Django App Model - Python manage.py makemigrations command
- Django Basic App Model - Makemigrations and Migrate
- Deploy Machine Learning Model using Flask
- Django model data types and fields list
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.