Skip to content
Related Articles

Related Articles

Introduction To Machine Learning using Python
  • Difficulty Level : Medium
  • Last Updated : 01 Nov, 2019

Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of Computer Programs that can change when exposed to new data. In this article, we’ll see basics of Machine Learning, and implementation of a simple machine learning algorithm using python.

Setting up the environment

Python community has developed many modules to help programmers implement machine learning. In this article, we will be using numpy, scipy and scikit-learn modules. We can install them using cmd command:

pip install numpy scipy scikit-learn 

A better option would be downloading miniconda or anaconda packages for python, which come prebundled with these packages. Follow the instructions given here to use anaconda.

Machine Learning overview

Machine learning involves a computer to be trained using a given data set, and use this training to predict the properties of a given new data. For example, we can train a computer by feeding it 1000 images of cats and 1000 more images which are not of a cat, and tell each time to the computer whether a picture is cat or not. Then if we show the computer a new image, then from the above training, the computer should be able to tell whether this new image is a cat or not.
The process of training and prediction involves the use of specialized algorithms. We feed the training data to an algorithm, and the algorithm uses this training data to give predictions on a new test data. One such algorithm is K-Nearest-Neighbor classification (KNN classification). It takes a test data, and finds k nearest data values to this data from test data set. Then it selects the neighbor of maximum frequency and gives its properties as the prediction result. For example if the training set is:

petal_size flower_type
1 a
2 b
1 a
2 b
3 c
4 d
3 c
2 b
5 a

Now we want to predict flower type for petal of size 2.5 cm. So if we decide no. of neighbors (K)=3, we see that the 3 nearest neighbors of 2.5 are 1, 2 and 3. Their frequencies are 2, 3 and 2 respectively. Therefore the neighbor of maximum frequency is 2 and flower type corresponding to it is b. So for a petal of size 2.5, the prediction will be flower type b.

Implementing KNN- classification algorithm using Python on IRIS dataset

Here is a python script which demonstrates knn classification algorithm. Here we use the famous iris flower dataset to train the computer, and then give a new value to the computer to make predictions about it. The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features are measured from each sample: The length and Width of Sepals & Petals, in centimeters.
We train our program using this dataset, and then use this training to predict species of a iris flower with given measurements.

Note that this program might not run on Geeksforgeeks IDE, but it can run easily on your local python interpreter, provided, you have installed the required libraries.

# Python program to demonstrate
# KNN classification algorithm
# on IRIS dataser
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(iris_dataset["data"], iris_dataset["target"], random_state=0)
kn = KNeighborsClassifier(n_neighbors=1), y_train)
x_new = np.array([[5, 2.9, 1, 0.2]])
prediction = kn.predict(x_new)
print("Predicted target value: {}\n".format(prediction))
print("Predicted feature name: {}\n".format
print("Test score: {:.2f}".format(kn.score(X_test, y_test)))


Predicted target name: [0]
Predicted feature name: ['setosa']
Test score: 0.97

Explanation of the program:

Training the Dataset

  • The first line imports iris data set which is already predefined in sklearn module. Iris data set is basically a table which contains information about various varieties of iris flowers.
  • We import kNeighborsClassifier algorithm and train_test_split class from sklearn and numpy module for use in this program.
  • Then we encapsulate load_iris() method in iris_dataset variable. Further we divide the dataset into training data and test data using train_test_split method. The X prefix in variable denotes the feature values (eg. petal length etc) and y prefix denotes target values (eg. 0 for setosa, 1 for virginica and 2 for versicolor).
  • This method divides dataset into training and test data randomly in ratio of 75:25. Then we encapsulate KNeighborsClassifier method in kn variable while keeping value of k=1. This method contains K Nearest Neighbor algorithm in it.
  • In the next line, we fit our training data into this algorithm so that computer can get trained using this data. Now the training part is complete.

Testing the Dataset

  • Now we have dimensions of a new flower in a numpy array called x_new and we want to predict the species of this flower. We do this using the predict method which takes this array as input and spits out predicted target value as output.
  • So the predicted target value comes out to be 0 which stands for setosa. So this flower has good chances to be of setosa species.
  • Finally we find the test score which is the ratio of no. of predictions found correct and total predictions made. We do this using the score method which basically compares the actual values of the test set with the predicted values.

Thus, we saw how machine learning works and developed a basic program to implement it using scikit-learn module in python.

This article is contributed by tkkhhaarree. If you like GeeksforGeeks and would like to contribute, you can also write an article using or mail your article to See your article appearing on the GeeksforGeeks main page and help other Geeks.


My Personal Notes arrow_drop_up
Recommended Articles
Page :