Contingency Table in Python

Estimations like mean, median, standard deviation, and variance are very much useful in case of the univariate data analysis. But in the case of bivariate analysis(comparing two variables) correlation comes into play.

Contingency Table is one of the techniques for exploring two or even more variables. It is basically a tally of counts between two or more categorical variables.

To get the Loan Data click here.



Loading Libraries

filter_none

edit
close

play_arrow

link
brightness_4
code

import numpy as np
import pandas as pd
import matplotlib as plt

chevron_right


Loading Data

filter_none

edit
close

play_arrow

link
brightness_4
code

data = pd.read_csv("loan_status.csv")
  
print (data.head(10))

chevron_right


Output:

Describe Data

filter_none

edit
close

play_arrow

link
brightness_4
code

data.describe()

chevron_right


Output:

Data Info

filter_none

edit
close

play_arrow

link
brightness_4
code

data.info()

chevron_right


Output:

Data Types

filter_none

edit
close

play_arrow

link
brightness_4
code

# data types of feature/attributes 
# in the data
data.dtypes

chevron_right


Output:

Code #1: Contingency Table showing correlation between Grades and loan status.


filter_none

edit
close

play_arrow

link
brightness_4
code

data_crosstab = pd.crosstab(data['grade'],
                            data['loan_status'], 
                               margins = False)
print(data_crosstab)

chevron_right


Output:

Code #2: Contingency Table showing correlation between Purpose and loan status.

filter_none

edit
close

play_arrow

link
brightness_4
code

data_crosstab = pd.crosstab(data['purpose'], 
                            data['loan_status'],
                                margins = False)
print(data_crosstab)

chevron_right


Output:

Code #3: Contingency Table showing correlation between Grades+Purpose and loan status.

filter_none

edit
close

play_arrow

link
brightness_4
code

data_crosstab = pd.crosstab([data.grade, data.purpose], 
                             data.loan_status, margins = False)
print(data_crosstab)

chevron_right


Output:

So as in the code, Contingency Tables are giving clear correlation values between two and more variables. Thus making it much more useful to understand the data for further information extraction.
.



My Personal Notes arrow_drop_up

Aspire to Inspire before I expire

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.