Open In App

Converting Pandas Crosstab into Stacked DataFrame

In this article, we will discuss how to convert a pandas crosstab to a stacked dataframe.

A stacked DataFrame is a multi-level index with one or more new inner levels as compared to the original DataFrame. If the columns have a single level, then the result is a series object.  



The panda’s crosstab function is a frequency table that shows the relationship between two or more variables by building a cross-tabulation table that computes the frequency among certain groups of data.

Syntax:



pandas.crosstab(index, columns, rownames=None, colnames=None)

Parameters:

Example:

In this example, we are creating 3 sample arrays namely car_brand, version, fuel_type as shown. Now, we are passing these arrays as the index, columns, and row and column names to the crosstab function as shown.

Finally, crosstab dataframe can also be visualized using the python plot.bar() function




# import the numpy and pandas package
import numpy as np
import pandas as pd
 
# create three separate arrays namely car_brand,
# version, fuel_type as shown
car_brand = np.array(["bmw", "bmw", "bmw", "bmw", "benz", "benz",
                      "bmw", "bmw", "benz", "benz", "benz", "benz",
                      "bmw", "bmw", "bmw", "benz", "benz", ],
                     dtype=object)
 
version = np.array(["one", "one", "one", "two", "one", "one", "one",
                    "two", "one", "one", "one", "two", "two", "two",
                    "one", "two", "one"], dtype=object)
 
fuel_type = np.array(["petrol", "petrol", "petrol", "diesel", "diesel",
                      "petrol", "diesel", "diesel", "diesel", "petrol",
                      "petrol", "diesel", "petrol", "petrol", "petrol",
                      "diesel", "diesel", ],
                     dtype=object)
 
# use pandas crosstab and pass the three arrays
# as index and columns to create a crosstab table.
cross_tab_data = pd.crosstab(index=car_brand,
                             columns=[version, fuel_type],
                             rownames=['car_brand'],
                             colnames=['version', 'fuel_type'])
 
print(cross_tab_data)
 
barplot = cross_tab_data.plot.bar()

Output:

Conversion of  crosstab to stacked dataframe: 

Here we are going to specify the number of the levels to be stacked. This will convert based on the axis levels on the particular columns of the pandas DataFrame.

Syntax:

pandas.DataFrame.stack(level, dropna)

Parameters:

Example 1:

Here, We will convert the crosstab to a stacked dataframe. The fuel_type level will be stacked as a column in the resulting dataframe.




# import the numpy and pandas package
import numpy as np
import pandas as pd
 
# create three separate arrays namely car_brand,
# version, fuel_type as shown
car_brand = np.array(["bmw", "bmw", "bmw", "bmw", "benz", "benz",
                      "bmw", "bmw", "benz", "benz", "benz", "benz",
                      "bmw", "bmw", "bmw", "benz", "benz", ],
                     dtype=object)
 
version = np.array(["one", "one", "one", "two", "one", "one", "one",
                    "two", "one", "one", "one", "two", "two", "two",
                    "one", "two", "one"], dtype=object)
 
fuel_type = np.array(["petrol", "petrol", "petrol", "diesel", "diesel",
                      "petrol", "diesel", "diesel", "diesel", "petrol",
                      "petrol", "diesel", "petrol", "petrol", "petrol",
                      "diesel", "diesel", ],
                     dtype=object)
 
# use pandas crosstab and pass the three
# arrays as index and columns
# to create a crosstab table.
cross_tab_data = pd.crosstab(index=car_brand,
                             columns=[version, fuel_type],
                             rownames=['car_brand'],
                             colnames=['version', 'fuel_type'])
 
barplot = cross_tab_data.plot.bar()
 
# use the created sample crosstab data
# to convert it to a stacked dataframe
stacked_data = cross_tab_data.stack(level=1)
 
print(stacked_data)

Output:

Example 2

In this example, we have shown the results for two levels 1 and 2.




# import the numpy and pandas package
import numpy as np
import pandas as pd
 
# create three separate arrays namely car_brand,
# version, fuel_type as shown
car_brand = np.array(["bmw", "bmw", "bmw", "bmw", "benz",
                      "benz", "bmw", "bmw", "benz", "benz",
                      "benz", "benz", "bmw", "bmw", "bmw",
                      "benz", "benz", ], dtype=object)
 
version = np.array(["one", "one", "one", "two", "one", "one",
                    "one", "two", "one", "one", "one", "two",
                    "two", "two", "one", "two", "one"],
                   dtype=object)
 
fuel_type = np.array(["petrol", "petrol", "petrol", "diesel",
                      "diesel", "petrol", "diesel", "diesel",
                      "diesel", "petrol", "petrol", "diesel",
                      "petrol", "petrol", "petrol", "diesel",
                      "diesel", ], dtype=object)
 
year_release = np.array([2000, 2005, 2000, 2007, 2000, 2005,
                         2007, 2005, 2005, 2000, 2007, 2000,
                         2007, 2005, 2005, 2007, 2000],
                        dtype=object)
 
# use pandas crosstab and pass the three arrays
# as index and columns to create a crosstab table.
cross_tab_data = pd.crosstab(index=car_brand,
                             columns=[version, fuel_type, year_release],
                             rownames=['car_brand'],
                             colnames=['version', 'fuel_type', 'year_release'])
 
barplot = cross_tab_data.plot.bar()
 
# use the created sample crosstab data to
# convert it to a stacked dataframe with
# level 1
stacked_data = cross_tab_data.stack(level=1)
 
 
barplot = stacked_data.plot.bar()
 
# use the created sample crosstab data to
# convert it to a stacked dataframe with
# level 2
stacked_data = cross_tab_data.stack(level=2)
 
barplot = stacked_data.plot.bar()

Output:


Article Tags :