Machine learning is a subset of Artificial Intelligence that provides a machine with the ability to learn automatically without being explicitly programmed. The machine in such cases improves from the experience without human intervention and adjusts actions accordingly. It is primarily of 3 types:
The K-nearest neighbor algorithm creates an imaginary boundary to classify the data. When new data points are added for prediction, the algorithm adds that point to the nearest of the boundary line. It follows the principle of “Birds of a feather flock together.” This algorithm can easily be implemented in the R language.
- Select K, the number of neighbors.
- Calculate the Euclidean distance of the K number of neighbors.
- Take the K nearest neighbors as per the calculated Euclidean distance.
- Count the number of data points in each category among these K neighbors.
- The new data point is assigned to the category for which the number of the neighbor is maximum.
Implementation in R
The Dataset: A sample population of 400 people shared their age, gender, and salary with a product company, and if they bought the product or not(0 means no, 1 means yes). Download the dataset Advertisement.csv
- The training set contains 300 entries.
- The test set contains 100 entries.
Confusion matrix result: [ ]
Visualizing the Training Data:
Visualizing the Test Data:
- There is no training period.
- KNN is an instance-based learning algorithm, hence a lazy learner.
- KNN does not derive any discriminative function from the training table, also there is no training period.
- KNN stores the training dataset and uses it to make real-time predictions.
- New data can be added seamlessly and it will not impact the accuracy of the algorithm as there is no training needed for the newly added data.
- There are only two parameters required to implement the KNN algorithm i.e. the value of K and the Euclidean distance function.
- The cost of calculating the distance between each existing point and the new point is huge in the new data set which reduces the performance of the algorithm.
- It becomes difficult for the algorithm to calculate the distance in each dimension because the algorithm does not work well with high dimensional data i.e. a data with a large number of features,
- There is a need for feature scaling (standardization and normalization) before applying the KNN algorithm to any dataset else KNN may generate wrong predictions.
- KNN is sensitive to noise in the data.
- Poisson Regression in R Programming
- Logistic Regression in R Programming
- Regression Analysis in R Programming
- Perform Linear Regression Analysis in R Programming - lm() Function
- Polynomial Regression in R Programming
- Random Forest Approach for Regression in R Programming
- Lasso Regression in R Programming
- Regression and its Types in R Programming
- Decision Tree for Regression in R Programming
- R-squared Regression Analysis in R Programming
- Ridge Regression in R Programming
- Elastic Net Regression in R Programming
- Quantile Regression in R Programming
- Regression with Categorical Variables in R Programming
- Types of Regression Techniques
- Add Leading Zeros to the Elements of a Vector in R Programming - Using paste0() and sprintf() Function
- Plotting Graphs using Two Dimensional List in R Programming
- How to create GUI in C programming using GTK Toolkit
- Using ggplot2 package in R Programming
- Social Network Analysis Using R Programming
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.