Open In App

Compute Classification Report and Confusion Matrix in Python

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to see how to compute classification reports and confusion matrices of size 3*3 in Python.

Confusion matrix and classification report, two are very commonly used and important library functions available in scikit learn library. But how many of those users can really implement these two functions from the scratch? There lies the problem. Merely 40% of ML enthusiasts will be able to do so. Many don’t even know the theory behind these. In this article, we are going to implement these two functions from scratch and will compare the result by using library functions also.

Problem statement

Given the iris dataset in .csv format. Our aim is to classify the flower species and develop a confusion matrix and classification report from scratch without using the python library functions. Also, compare the result of scratch functions with the standard library functions.

Iris dataset is the multiclass dataset. There are 5 columns in the dataset. The first four columns represent the properties of flower species: petal length, petal width, sepal length, and sepal width. The last column tells the class label of the flower. There are 3 different classes for flower species: Virginica, Setosa, and Versicolor.

Examples of confusion matrix:

Input: y_true = {2, 0, 2, 2, 0, 1}

       y_pred = {0, 0, 2, 2, 0, 2}

Output: confusion_matrix:

{

{2, 0, 0},

{0, 0, 1},

{1, 0, 2}

}

Explanation:

  • Row indicates the actual values of data and columns indicate the predicted data.
  • There are three labels i.e. 0, 1 and 2.
  • Actual data of label 0 is predicted as: 2, 0, 0; 2 points are predicted as class-0, 0 points as class-1, 0 points as class-2.
  • 2nd row {0, 0, 1} tells that: 0 points of class-1 are predicted as class-0 and class-1. 1 point of class-1 is predicted as class-2.
  • 3rd row {1, 0, 2}:
    • 1 point of class-2 is predicted to class-0.
    • 0 point of class-2 is predicted to class-1.
    • 2 points of class-2 are predicted to class-2.

                   

Input: y_true = [“cat”, “ant”, “cat”, “cat”, “ant”, “bird”]

       y_pred = [“ant”, “ant”, “cat”, “cat”, “ant”, “cat”]

Output: confusion_matrix(y_true, y_pred, labels=[“ant”, “bird”, “cat”])

{      ant bird cat

‘ant’=  {2, 0, 0},

‘bird’= {0, 0, 1},

‘cat’=  {1, 0, 2}}       

Explanation:

  • There are 2 ants in actual data-set. Out of total, 2 are predicted as ant, 0 are predicted as bird and cat.
  • There are 1 bird in actual dataset. Out of that, 0 are predicted as ant and bird. 1 is predicted as cat.
  • There are 3 cats in actual dataset. Out of that, 1 is predicted as ant, 2 are predicted as cat.       

What is a classification report?

As the name suggests, it is the report which explains everything about the classification. This is the summary of the quality of classification made by the constructed ML model. It comprises mainly 5 columns and (N+3) rows. The first column is the class label’s name and followed by Precision, Recall, F1-score, and Support. N rows are for N class labels and other three rows are for accuracy, macro average, and weighted average. 

Precision: It is calculated with respect to the predicted values. For class-A, out of total predictions how many were really belong to class-A in actual dataset, is defined as the precision. It is the ratio of [i][i] cell of confusion matrix and sum of the [i] column.

Recall: It is calculated with respect to the actual values in dataset. For class-A, out of total entries in dataset, how many were actually classified in class-A by the ML model, is defined as the recall. It is the ratio of [i][i] cell of confusion matrix and sum of the [i] row.

F1-score: It is the harmonic mean of precision and recall.

Support: It is the total entries of each class in the actual dataset. It is simply the sum of rows for every class-i.

Examples of classification reports:

Input: confusion[][]= { 

{1, 0, 0},

{1, 0, 0},

{0, 1, 2}

};

Output:

              precision    recall  f1-score   support

     class 0       0.50      1.00      0.67         1

     class 1       0.00      0.00      0.00         1

     class 2       1.00      0.67      0.80         3

    accuracy                           0.60         5

   macro avg       0.50      0.56      0.49         5

weighted avg       0.70      0.60      0.61         5

    

Input confusion [][]= {

{0, 1},

{0, 2}

};

Output:

             precision    recall  f1-score   support

           1       1.00      0.67      0.80         3

           2       0.00      0.00      0.00         0

           3       0.00      0.00      0.00         0

   accuracy                            0.67

   macro avg       0.33      0.22      0.27         3

weighted avg       1.00      0.67      0.80         3

Below is the code implementation

Python3




# Our aim is to build the function
# for calculating the confusion_matrix
# and classification_report
# for multiclass classification, like IRIS dataset.
  
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report
  
# Function for confusion matrix
def Confusion_matrix(y_test, y_pred, target_names=None):
    
    # target_names is a list.
    # actual values are arranged in the rows.
    # predicted values are arranged in the columns.
    # if there are m classes, then cm is m*m matrix.
    if target_names == None:
        m = len(set(y_test))
    else:
        m = len(target_names)
    size = len(y_test)
    matrix = dict()
  
    # create matrix initialised with 0
    for class_name in range(m):
        matrix[class_name] = [0 for k in range(m)]
  
    # populating the matrix.
    for i in range(size):
        actual_class = y_test[i]
        pred_class = y_pred[i]
        matrix[actual_class][pred_class] += 1
  
    # Change the name of columns.
    if target_names == None:
        # Now, lets print the confusion matrix.
        print("Confusion Matrix of given model is :")
        if m == 3:
            print("Count=%-14d %-15s %-15s %-15s" % (size, 
                                                     '0', '1',
                                                     '2'))
            for key, value in matrix.items():
                print("Actual %-13s %-15d %-15d %-15d" %
                      (key, value[0], value[1], value[2]))
        elif m == 2:
            print("Count=%-14d %-15s %-15s" % (size, '0', '1'))
            for key, value in matrix.items():
                print("Actual %-13s %-15d %-15d" % (key, value[0], 
                                                    value[1]))
    else:
        matrix = dict(zip(target_names, list(matrix.values())))
  
        # Now, lets print the confusion matrix.
        print("Confusion Matrix of given model is :")
        print("Count=%-14d %-15s %-15s %-15s" %
              (size, target_names[0], target_names[1], target_names[2]))
        for key, value in matrix.items():
            print("Actual %-13s %-15d %-15d %-15d" %
                  (key, value[0], value[1], value[2]))
  
    return matrix
  
# Function for performance report.
def performance_report(cm):
    col = len(cm)
      
    # col=number of class
    arr = []
    for key, value in cm.items():
        arr.append(value)
  
    cr = dict()
    support_sum = 0
      
    # macro avg of support is
    # sum of support only, not the mean.
    macro = [0]*3  
      
    # weighted avg of support is
    # sum of support only, not the mean.
    weighted = [0]*3
    for i in range(col):
        vertical_sum= sum([arr[j][i] for j in range(col)])
        horizontal_sum= sum(arr[i])
        p = arr[i][i] / vertical_sum
        r = arr[i][i] / horizontal_sum
        f = (2 * p * r) / (p + r)
        s = horizontal_sum
        row=[p,r,f,s]
        support_sum+=s
        for j in range(3):
            macro[j]+=row[j]
            weighted[j]+=row[j]*s
        cr[i]=row
  
    # add Accuracy parameters.
    truepos=0
    total=0
    for i in range(col):
        truepos+=arr[i][i]
        total+=sum(arr[i])
  
    cr['Accuracy']=["", "", truepos/total, support_sum]
  
    # Add macro-weight and weighted_avg features.
    macro_avg=[Sum/col for Sum in macro]
    macro_avg.append(support_sum)
    cr['Macro_avg']=macro_avg
  
    weighted_avg=[Sum/support_sum for Sum in weighted]
    weighted_avg.append(support_sum)
    cr['Weighted_avg']=weighted_avg
  
    # print the classification_report
    print("Performance report of the model is :")
    space,p,r,f,s=" ","Precision","Recall","F1-Score","Support"
    print("%13s %9s %9s %9s %9s\n"%(space,p,r,f,s))
    stop=0
    for key,value in cr.items():
        if stop<col:
            stop+=1
            print("%13s %9.2f %9.2f %9.2f %9d"%(key,value[0],
                                                value[1],
                                                value[2],
                                                value[3]))
        elif stop==col:
            stop+=1
            print("\n%13s %9s %9s %9.2f %9d"%(key,value[0],
                                              value[1],
                                              value[2],
                                              value[3]))
        else:
            print("%13s %9.2f %9.2f %9.2f %9d"%(key,
                                                value[0],
                                                value[1],
                                                value[2],
                                                value[3]))
  
# from sklearn.metrics import
# confusion_matrix,classification_report
# Main Function is here.
def main():
    dataset=load_iris()
    X,y,classes=dataset['data'],dataset['target'],
    dataset['target_names']
  
    X_train,X_test,y_train,y_test=train_test_split(
      X,y,shuffle=True,random_state=5,test_size=0.3)
      
    model=GaussianNB().fit(X_train,y_train)
    y_pred=model.predict(X_test)
    classes=list(classes)
    cm=Confusion_matrix(y_test, y_pred, classes)
    cr=performance_report(cm)
    print("\nCR by library method=\n",
          classification_report(y_test, y_pred))
  
if __name__ == '__main__':
    main()


Output:

Time complexity: O(N*N)

Space complexity: (N*4), for each label we need an array of size 4.



Last Updated : 18 Mar, 2022
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads