Open In App

Handling Categorical Data with Bokeh – Python

As a data scientist, you will often come across datasets with categorical data. Categorical data is a type of data that can be divided into distinct categories or groups. For example, a dataset might have a column with the categories “red”, “green”, and “blue”. Handling categorical data can be challenging because it cannot be processed in the same way as numerical data.

One way to visualize and analyze categorical data is through the use of Bokeh, a powerful Python library for creating interactive visualizations. In this blog, we will explore how to handle categorical data with Bokeh and provide some examples to illustrate the concepts.



Concepts:

Before diving into the specifics of handling categorical data with Bokeh, it is important to understand a few key concepts.

Steps:

Now that we have a basic understanding of the concepts, let’s go through the steps for handling categorical data with Bokeh.



  1. Import the necessary libraries (Bokeh and any others you might need)
  2. Create a toy dataset with your categorical data
  3. Create a figure object and set the x_range or y_range to the categories you want to plot
  4. Use the vbar() or hbar() glyph methods to plot the data, specifying the categories as the x or y coordinates and the values as the top or right coordinates
  5. Optional: customize the appearance of the plot by setting the width, adding grid lines, and setting the range start values
  6. Display the plot using the show() function
     

Example

Now that we have gone through the steps for handling categorical data with Bokeh, let’s look at some examples to further illustrate the concepts.

Example :1 Simple Bar chart




#Import the necessary libraries
from bokeh.io import output_file, show
from bokeh.plotting import figure
from sklearn.datasets import load_breast_cancer
# Load the datasets
data = load_breast_cancer(as_frame=True)
  
  
# file to save the model
output_file("Breast Cancer.html")
  
# color of the wedges
color = ["orange","green"]
  
  
p = figure(x_range=data.target_names, height=350, title="breast cancer",toolbar_location=None, tools="")
  
p.vbar(x=data.target_names, top=data['target'].value_counts(), color=color, width=0.9)
  
show(p)

Outputs: 

 

The above code generates a bar chart as shown using the Bokeh library in Python.

Code explanations:

The output_file function specifies the name and location of the HTML file where the chart will be saved.

Example: 2 Nested Bar chart

In this example, we will use a toy dataset with six categories of fruits: Carrots’, ‘Peas’, ‘Broccoli’, ‘Cauliflower’, ‘Beans’, ‘Peppers’.




# Import the necessary module
from bokeh.io import output_file, show
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.plotting import figure
from bokeh.transform import factor_cmap
from sklearn.datasets import load_iris
  
# Load the iris dataset
data = load_iris(as_frame=True)
data= data.frame
  
# Replace 0, 1, 2 with respective iris flower name
data.target.replace({0: load_iris().target_names[0],
                     1:load_iris().target_names[1],
                     2:load_iris().target_names[2]}, 
                    inplace = True)
  
# CGroup the iris flower by the average length value
df = data.groupby('target').agg('mean')
  
# This will create a list with iris flower name and their respective sepal & petal length and width name
x = [ (cls, col) for cls in data.target.unique() for col in data.columns[:-1] ]
  
# This will contain the respective numerical for each x
mean = sum(zip(df['sepal length (cm)'], 
               df['sepal width (cm)'], 
               df['petal length (cm)'], 
               df['petal width (cm)']), ())
  
  
# create the dicti
source= ColumnDataSource(data=dict(x=x, Average = mean))
  
# file to save the model
output_file("Bokeh Nested Bar chart.html")
  
# FIgure
p = figure(x_range=FactorRange(*x), 
           height=350
           title="Iris Flower Average length",
            )
#color
color = ['orange','#FF0000','green','#00FF00',]
  
# Vertical bar chaRT
p.vbar(x='x'
       top='Average',
       width=0.9
       source=source, 
       line_color="white",
       fill_color=factor_cmap('x', palette=color, factors=data.columns[:-1], start=1, end=2)
      )
  
  
# Orintation of text
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
show(p)

Output:

Bokeh Nested Bar chart

Code Explanations:

Conclusion:

In this blog, we explored how to handle categorical data with Bokeh and provided some examples to illustrate the concepts. By using Bokeh, you can easily visualize and analyze categorical data, making it a valuable tool for data science.


Article Tags :