Open In App

What is Cross-Tabulation and how does it organize data in a table?

Last Updated : 03 Jan, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Cross-tabulation is also referred as crosstab. It is a statistical technique used to organize and analyze the relationship between two or more categorical variables. The article explores the Cross-tabulation technique and demonstrates the implementation technique to organize data in a table.

What is Cross-Tabulation?

Cross-tabulation is a special technique to organize data in table format which facilitates a clear and concise representation of the relationships between categorical variables. This arrangement generally involves one categorical variable to define the rows of the table and another categorical variable to define the columns where the intersections of the rows and columns contain the frequency or count of observations corresponding to the combinations of the variables. This tabular format allows Data Science and Machine Learning analysts and researchers to easily identify patterns, trends, and dependencies between categorical variables.

How does it organize data?

Some of the key steps for the organizing process are listed below.

  1. At first, we need to load the dataset in which we are going to perform the Cross-Tabulation. This process could be easily done by Pandas module of Python.
  2. Now we will choose two or more categorical variables to analyze the table. These variables will be used to define the rows and columns of the cross-tabulation table.
  3. Now we call the ‘pd.crosstab()’ function to organize the table as per our pre-defined categorial variables.
  4. Finally, we will display the resulting cross-tabulation table and explore the relationships between the categories.

Implementation

Importing module and loading dataset

For this implementation, we only need to import Python Pandas module. Then we will load the famous ‘Titanic’ dataset.

Python3




import pandas as pd
 
# URL of the Titanic dataset
# Load the Titanic dataset into a pandas DataFrame
titanic_df = pd.read_csv(titanic_url)
# Display the first few rows of the dataset to understand its structure
titanic_df.head(10)


Cross-tabulation

In this dataset, there two categorial features which are ‘PClass’ and ‘Sex’ and the corresponding target feature is ‘Survived’. So we organize the table as ‘PClass’ with ‘Survived’ and ”Sex” with “Survived” separately.

Python3




# Create a cross-tabulation table for 'Pclass' and 'Survived'
cross_tab_PClass = pd.crosstab(titanic_df['Pclass'], titanic_df['Survived'], margins=True)
# Display the cross-tabulation table
cross_tab_PClass


Output:

Survived    0    1  All
Pclass
1 80 136 216
2 97 87 184
3 368 119 487
All 545 342 887

Python3




# Create a cross-tabulation table for 'Pclass' and 'Survived'
cross_tab_Sex = pd.crosstab(titanic_df['Sex'], titanic_df['Survived'], margins=True)
# Display the cross-tabulation table
cross_tab_Sex


Output:

Survived    0    1  All
Sex
female 81 233 314
male 464 109 573
All 545 342 887

We can conclude that crosstab is a very useful tool for organizing dataset against categorial features which is very effective for understanding the dataset.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads