newcohospitality.com

Creating Impressive Report-Ready Plots with Python Bokeh

Written on

Data science fundamentally revolves around effective communication. Enthusiasts in this field invest time mastering numerical techniques, manipulating datasets, and developing regression models to derive insights. However, all these efforts are futile without the ability to convey findings clearly.

Thus, I contend that data science is primarily about communication. The key aspect is presenting the results of your analysis in a manner that persuades others to engage with your insights. Since most individuals prefer not to delve into raw datasets or analyze statistical models, data visualization becomes an essential tool in a data scientist's toolkit.

Fortunately, Python provides several libraries specifically tailored for crafting professional-grade plots that facilitate communication with stakeholders. One of my preferred libraries is Bokeh, and this article will guide you through the process of creating polished plots using its features.

For Beginners in Python

If you're already familiar with Python, feel free to skip this section. For those new to Python, initial setup is required to get started. Various methods exist for accessing and using Python, allowing for flexibility in your choice. Personally, I recommend the Anaconda distribution, which includes many essential libraries and the Spyder IDE.

You can download the free individual version of Anaconda at: https://www.anaconda.com/products/individual. Once installed, you'll have everything necessary to follow this tutorial.

After installation, launch the Spyder IDE to begin.

Getting Started

The first steps in using Bokeh for plotting involve: 1) importing the necessary Bokeh functions, and 2) obtaining a dataset. Since we don't have a specific dataset, we will use NumPy to generate a basic one for exploration purposes.

We start by importing the required libraries, including Bokeh for creating plots and NumPy for dataset generation. The relevant code for this is:

from bokeh.plotting import figure, save, output_file import numpy as np

The Bokeh functions allow us to create figures, save our plots, and define the output file location. By importing NumPy as 'np', we can conveniently access its functions.

Next, we will create two arrays to serve as our x and y data. The x values will range from 0 to 10 in increments of 1, while the y values will span from 0 to 5 in increments of 0.5. The code for this is:

x = np.arange(0, 10, 1) y = np.arange(0, 5, 0.5)

Note that the final values (10 for x and 5 for y) will not appear in the output due to Python's indexing behavior. To include these values, you should set your upper limit to one step higher.

Crafting Your Initial Plot

To create a plot, we need to instantiate a plot object using Bokeh's figure function. While it's possible to create a plot without specifying parameters, including them ensures your plot is ready for reporting. Key parameters to consider are:

  • width: The width of the plot in pixels. A common starting point is 800 pixels, but adjustments may be needed based on the complexity of your data.
  • height: The height of the plot, typically starting at 400 pixels.
  • x_axis_label: A detailed label for the x-axis, enabling readers to understand what the data represents.
  • y_axis_label: Similar to the x-axis label, but for the y-axis.

Here’s how you can set up the plot object:

p1 = figure(width=800, height=400, x_axis_label='Time Since Experiment Start (Minutes)', y_axis_label='Distance Driven by Test Car (Kilometers)')

With the plot object created, we can now add our data. To represent the x and y data as circles, we will utilize the circle function from the figure object. The necessary inputs include:

  • x data: The array for x values.
  • y data: The array for y values.
  • legend: A label for the data series, providing context.
  • color: The marker color for the data points.

The code to add a new data series is:

p1.circle(x, y, legend='Honda', color='red')

Next, we need to specify where to save the plot. The output_file function allows us to set both the file path and name:

output_file('C:\Users\JSmith\Desktop\FirstPlot.html', title='First Plot')

After setting the output location, we save the plot with:

save(p1)

Running this code will generate an HTML file named 'FirstPlot' on your desktop. Opening it will display your first plot.

The first plot

On the right side of the plot, you'll find several useful tools, including:

  • Pan: Allows you to move the plot around.
  • Box Zoom: Enables zooming in on a selected area by clicking and dragging.
  • Wheel Zoom: Lets you zoom in/out using your mouse wheel.
  • Save: Allows you to save the plot as an image file for reports.

Enhancing the Plot

You might have noticed several issues with the initial plot:

  • The legend obscures a data point.
  • The axis labels are small and hard to read.
  • Tick labels and legend fonts also lack clarity.

Fortunately, Bokeh provides solutions to address these issues.

To reposition the legend, we can set its location to the bottom right corner:

p1.legend.location = "bottom_right"

We can also modify the font sizes for axis labels and tick marks by adding the following lines:

p1.xaxis.axis_label_text_font_size = "16pt" p1.yaxis.axis_label_text_font_size = "16pt" p1.xaxis.major_label_text_font_size = "14pt" p1.yaxis.major_label_text_font_size = "14pt" p1.legend.label_text_font_size = '16pt'

With these enhancements, running the code will yield an improved plot.

The improved plot

As you can see, the readability has significantly improved. The legend's new position reveals the previously hidden data point, making the plot much more informative.

Adding More Data Series

A plot with just one data series can be limiting. To provide more context or comparisons, we can easily add additional data series with Bokeh.

First, we’ll create a new y dataset:

y2 = np.arange(0, 20, 2)

Next, we can incorporate this new data into our existing plot with a simple line of code:

p1.diamond(x, y2, legend='Ferrari', color='blue')

This time, we’ve altered the y data, updated the legend, and changed the marker style to a diamond, which is beneficial for clarity, especially in black and white prints.

To avoid overlapping, we can relocate the legend again:

p1.legend.location = "top_left"

The final plot will look like this:

The final plot with multiple data series

Notice how the improved design enhances readability and the visual appeal of the plot, making it suitable for reports.

This is a report-ready plot.

Shameless self-promotion: If you're keen on enhancing your Python skills, consider reading my book, 1000x Faster: How to Automate Laboratory Data Analysis with Python.