Converting Pandas Crosstab into Stacked DataFrame
In this article, we will discuss how to convert a pandas crosstab to a stacked dataframe.
A stacked DataFrame is a multi-level index with one or more new inner levels as compared to the original DataFrame. If the columns have a single level, then the result is a series object.
The panda’s crosstab function is a frequency table that shows the relationship between two or more variables by building a cross-tabulation table that computes the frequency among certain groups of data.
pandas.crosstab(index, columns, rownames=None, colnames=None)
- index – array or series or list of array-like objects. this value is used to group in rows
- columns – array or series or list of array-like objects. this value is used to group in columns
- rownames – the name specified here must match the number of row arrays passed.
- colnames – the name specified here must match the number of column arrays passed.
In this example, we are creating 3 sample arrays namely car_brand, version, fuel_type as shown. Now, we are passing these arrays as the index, columns, and row and column names to the crosstab function as shown.
Finally, crosstab dataframe can also be visualized using the python plot.bar() function
Conversion of crosstab to stacked dataframe:
Here we are going to specify the number of the levels to be stacked. This will convert based on the axis levels on the particular columns of the pandas DataFrame.
- level – specifies the levels to be stacked from the column axis to the index axis in the resulting dataframe
- dropna – a bool type. Whether to drop or not the rows in the resulting DataFrame/Series with missing values
Here, We will convert the crosstab to a stacked dataframe. The fuel_type level will be stacked as a column in the resulting dataframe.
In this example, we have shown the results for two levels 1 and 2.