In this article, we are going to see how to create crosstabs from dictionaries in Python. The pandas crosstab function builds a cross-tabulation table that can show the frequency with which certain groups of data appear.
This method is used to compute a simple cross-tabulation of two (or more) factors. By default, computes a frequency table of the factors unless an array of values and an aggregation function are passed.
Syntax: pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name=’All’, dropna=True, normalize=False)
Arguments :
- index : array-like, Series, or list of arrays/Series, Values to group by in the rows.
- columns : array-like, Series, or list of arrays/Series, Values to group by in the columns.
- values : array-like, optional, array of values to aggregate according to the factors. Requires `aggfunc` be specified.
- rownames : sequence, default None, If passed, must match number of row arrays passed.
- colnames : sequence, default None, If passed, must match number of column arrays passed.
- aggfunc : function, optional, If specified, requires `values` be specified as well.
- margins : bool, default False, Add row/column margins (subtotals).
- margins_name : str, default ‘All’, Name of the row/column that will contain the totals when margins is True.
- dropna : bool, default True, Do not include columns whose entries are all NaN.
*** QuickLaTeX cannot compile formula: *** Error message: Error: Nothing to show, formula is empty
Stepwise implementation:
Step 1: Create a dictionary.
raw_data = { 'Digimon' : [ 'Kuramon' , 'Pabumon' , 'Punimon' ,
'Botamon' , 'Poyomon' , 'Koromon' ,
'Tanemon' , 'Tsunomon' , 'Tsumemon' ,
'Tokomon' ],
'Stage' : [ 'Baby' , 'Baby' , 'Baby' , 'Baby' , 'Baby' ,
'In-Training' , 'In-Training' , 'In-Training' ,
'In-Training' , 'In-Training' ],
'Type' : [ 'Free' , 'Free' , 'Free' , 'Free' , 'Free' , 'Free' ,
'Free' , 'Free' , 'Free' , 'Free' ],
'Attribute' : [ 'Neutral' , 'Neutral' , 'Neutral' ,
'Neutral' , 'Neutral' , 'Fire' , 'Plant' ,
'Earth' , 'Dark' , 'Neutral' ],
'Memory' : [ 2 , 2 , 2 , 2 , 2 , 3 , 3 , 3 , 3 , 3 ],
'Equip Slots' : [ 0 , 0 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 ],
'Lv 50 HP' : [ 324 , 424 , 5343 , 52 , 63 , 42 ,
643 , 526 , 42 , 75 ],
'Lv50 SP' : [ 86 , 75 , 64 , 43 , 86 , 64 , 344 ,
24 , 24 , 12 ],
'Lv50 Atk' : [ 86 , 74 , 6335 , 421 , 23 , 36436 ,
65 , 75 , 86 , 52 ]}
print (raw_data)
|
Output:
{‘Digimon’: [‘Kuramon’, ‘Pabumon’, ‘Punimon’, ‘Botamon’, ‘Poyomon’, ‘Koromon’, ‘Tanemon’, ‘Tsunomon’, ‘Tsumemon’, ‘Tokomon’], ‘Stage’: [‘Baby’, ‘Baby’, ‘Baby’, ‘Baby’, ‘Baby’, ‘In-Training’, ‘In-Training’, ‘In-Training’, ‘In-Training’, ‘In-Training’], ‘Type’: [‘Free’, ‘Free’, ‘Free’, ‘Free’, ‘Free’, ‘Free’, ‘Free’, ‘Free’, ‘Free’, ‘Free’], ‘Attribute’: [‘Neutral’, ‘Neutral’, ‘Neutral’, ‘Neutral’, ‘Neutral’, ‘Fire’, ‘Plant’, ‘Earth’, ‘Dark’, ‘Neutral’], ‘Memory’: [2, 2, 2, 2, 2, 3, 3, 3, 3, 3], ‘Equip Slots’: [0, 0, 1, 1, 1, 1, 1, 1, 1, 1], ‘Lv 50 HP’: [324, 424, 5343, 52, 63, 42, 643, 526, 42, 75], ‘Lv50 SP’: [86, 75, 64, 43, 86, 64, 344, 24, 24, 12], ‘Lv50 Atk’: [86, 74, 6335, 421, 23, 36436, 65, 75, 86, 52]}
*** QuickLaTeX cannot compile formula: *** Error message: Error: Nothing to show, formula is empty
Step 2: Create a dataframe by using the Pandas Dataframe function.
import pandas as pd
raw_data_df = pd.DataFrame(raw_data,columns = [ 'Digimon' , 'Stage' ,
'Type' , 'Attribute' ,
'Memory' , 'Equip Slots' ,
'Lv 50 HP' , 'Lv50 SP' ,
'Lv50 Atk' ])
print (raw_data_df)
|
Output:
Step 3: Using crosstab.
import pandas as pd
raw_data_df = pd.DataFrame(raw_data,columns = [ 'Digimon' , 'Stage' ,
'Type' ,
'Attribute' , 'Memory' ,
'Equip Slots' ,
'Lv 50 HP' , 'Lv50 SP' ,
'Lv50 Atk' ])
print (raw_data_df)
|
Output:
You can add multiple indices (rows) to a crosstab as well. This can be done by passing a list of variables to the crosstab function, you wanted to break items down by region and quarter, you can pass these into the index parameter.
raw_data_fd = pd.crosstab(
[raw_data_df[ 'Attribute' ], raw_data_df[ 'Memory' ]],
raw_data_df[ 'Digimon' ], margins = True )
raw_data_fd |
Output