What is Cross-Tabulation and how does it organize data in a table?

Last Updated : 03 Jan, 2024

Cross-tabulation is also referred as crosstab. It is a statistical technique used to organize and analyze the relationship between two or more categorical variables. The article explores the Cross-tabulation technique and demonstrates the implementation technique to organize data in a table.

What is Cross-Tabulation?

Cross-tabulation is a special technique to organize data in table format which facilitates a clear and concise representation of the relationships between categorical variables. This arrangement generally involves one categorical variable to define the rows of the table and another categorical variable to define the columns where the intersections of the rows and columns contain the frequency or count of observations corresponding to the combinations of the variables. This tabular format allows Data Science and Machine Learning analysts and researchers to easily identify patterns, trends, and dependencies between categorical variables.

How does it organize data?

Some of the key steps for the organizing process are listed below.

At first, we need to load the dataset in which we are going to perform the Cross-Tabulation. This process could be easily done by Pandas module of Python.
Now we will choose two or more categorical variables to analyze the table. These variables will be used to define the rows and columns of the cross-tabulation table.
Now we call the ‘pd.crosstab()’ function to organize the table as per our pre-defined categorial variables.
Finally, we will display the resulting cross-tabulation table and explore the relationships between the categories.

Implementation

Importing module and loading dataset

For this implementation, we only need to import Python Pandas module. Then we will load the famous ‘Titanic’ dataset.

Python3

import pandas as pd
 
# URL of the Titanic dataset
titanic_url = "https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv"
# Load the Titanic dataset into a pandas DataFrame
titanic_df = pd.read_csv(titanic_url)
# Display the first few rows of the dataset to understand its structure
titanic_df.head(10)

Cross-tabulation

In this dataset, there two categorial features which are ‘PClass’ and ‘Sex’ and the corresponding target feature is ‘Survived’. So we organize the table as ‘PClass’ with ‘Survived’ and ”Sex” with “Survived” separately.

Python3

# Create a cross-tabulation table for 'Pclass' and 'Survived'
cross_tab_PClass = pd.crosstab(titanic_df['Pclass'], titanic_df['Survived'], margins=True)
# Display the cross-tabulation table
cross_tab_PClass

Output:

Survived    0    1  All
Pclass                 
1          80  136  216
2          97   87  184
3         368  119  487
All       545  342  887

Python3

# Create a cross-tabulation table for 'Pclass' and 'Survived'
cross_tab_Sex = pd.crosstab(titanic_df['Sex'], titanic_df['Survived'], margins=True)
# Display the cross-tabulation table
cross_tab_Sex

Output:

Survived    0    1  All
Sex                    
female     81  233  314
male      464  109  573
All       545  342  887

We can conclude that crosstab is a very useful tool for organizing dataset against categorial features which is very effective for understanding the dataset.

Suggest improvement

Data Table In Excel : One Variable and Two Variable (In Easy Steps )

Share your thoughts in the comments

What is Cross-Tabulation and how does it organize data in a table?