Bokeh: Python interactive visualization

Outline

  1. What is Bokeh?
  2. Installation
  3. Easy examples (with explanations)
  4. Cool stuff
  5. rbokeh

What is Bokeh?

A Python interactive visualization library that targets modern web browsers for presentation. Bokeh renders plots using HTML canvas and provides many mechanisms for interactivity.

Bokeh exposes different interface levels to the users:

  • a low Level (and more flexible) glyph interface
  • an intermediate level interface called plotting
  • a high level interface that can be used to build complexs plot in a simple way.

What can it do?

Installation

Dependencies:

  • NumPy
  • Pandas
  • Flask
  • Redis
  • Redis-py
  • Six
  • Requests
  • Tornado >= 4.0
  • Werkzeug
  • Greenlet
  • Gevent
  • Gevent-websocket
  • PyZMQ
  • PyYaml
  • DateUtil

$ pip install bokeh

Easy examples

In [1]:
from bokeh.plotting import figure, output_notebook, show

# output to notebook
output_notebook() 

# prepare some data
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

# Plot a `line` renderer setting the color, line thickness, title, and legend value.
p = figure(title="simple line example")
p.line(x, y, legend="Temp.", x_axis_label='x', y_axis_label='y')

show(p)
BokehJS successfully loaded.
In [3]:
from bokeh.plotting import figure, output_notebook, show

# prepare some data
x0 = [1, 2, 3, 4, 5]
y1 = [x**2 for x in x0]
y2 = [10**x for x in x0]
y3 = [10**(x**2) for x in x0]

# output to static HTML file
output_notebook()

# create a new figure
p = figure(
    tools="pan,box_zoom,reset,save",
    y_axis_type="log", y_range=[0.001, 10**22], title="log axis example",
    x_axis_label='sections', y_axis_label='particles'
)

# create plots!
p.line(x0, x0, legend="y=x")
p.circle(x0, x0, legend="y=x")
p.line(x0, y1, legend="y=x**2")
p.circle(x0, y1, fill_color=None, line_color="green", legend="y=x**2")
p.line(x0, y2, line_color="red", line_width=2, legend="y=10^x")
p.line(x0, y3, line_color="orange", line_width=2, legend="y=10^(x^2)")

show(p)
In [2]:
import numpy as np
from bokeh.plotting import figure, output_notebook, show

# output to notebook
output_notebook() 

# prepare data
mu, sigma = 0, 0.5
measured = np.random.normal(mu, sigma, 1000)
hist, edges = np.histogram(measured, density=True, bins=50)
x = np.linspace(-2, 2, 1000)

p = figure(title="Histogram", background_fill="#E8DDCB")
p.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
       fill_color="#036564", line_color="#033649")

# customize axes
xa, ya = p.axis
xa.axis_label = 'x'
ya.axis_label = 'Pr(x)'

show(p)
In [4]:
import numpy as np
import scipy.special
from bokeh.plotting import figure, output_notebook, show

# prepare data
mu, sigma = 0, 0.5
measured = np.random.normal(mu, sigma, 1000)
hist, edges = np.histogram(measured, density=True, bins=50)
x = np.linspace(-2, 2, 1000)
pdf = 1/(sigma * np.sqrt(2*np.pi)) * np.exp(-(x-mu)**2 / (2*sigma**2))
cdf = (1+scipy.special.erf((x-mu)/np.sqrt(2*sigma**2)))/2

# output to static HTML file
output_notebook()

# prepare the histogram
p = figure(title="Normal Distribution (μ=0, σ=0.5)",tools="previewsave",
           background_fill="#E8DDCB")
p.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
       fill_color="#036564", line_color="#033649",)

# Use `line` renderers to display the PDF and CDF
p.line(x, pdf, line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF")
p.line(x, cdf, line_color="white", line_width=2, alpha=0.7, legend="CDF")

# customize axes
p.legend.orientation = "top_left"
xa, ya = p.axis
xa.axis_label = 'x'
ya.axis_label = 'Pr(x)'

show(p)

Cool stuff

In [5]:
from bokeh.plotting import ColumnDataSource, figure, gridplot, output_notebook, show
from bokeh.sampledata.autompg import autompg

output_notebook()

# Load some Automobile data into a data source. Interesting columns are:
# "yr" - Year manufactured
# "mpg" - miles per gallon
# "displ" - engine displacement
# "hp" - engine horsepower
# "cyl" - number of cylinders
source = ColumnDataSource(autompg.to_dict("list"))
source.add(autompg["yr"], name="yr")

# define some tools to add
TOOLS = "pan,wheel_zoom,box_zoom,box_select,lasso_select"
# Let's set up some plot options in a dict that we can re-use on multiple plots

# Let's set up some plot options in a dict that we can re-use on multiple plots
plot_config = dict(plot_width=300, plot_height=300, tools=TOOLS)

# First let's plot the "yr" vs "mpg" using the plot config above
# Note that we are supplying our our data source to the renderer explicitly
p1 = figure(title="MPG by Year", **plot_config)
p1.circle("yr", "mpg", color="blue", source=source)

# another figure p2 with circle renderer, for "hp" vs "displ" with
# color "green". This renderer should use the same data source as the renderer
# above, that is what will cause the plots selections to be linked
p2 = figure(title="HP vs. Displacement", **plot_config)
p2.circle("hp", "displ", color="green", source=source)

# another figure p3 with circle renderer for "mpg" vs "displ",
# with the size proportional to "cyl". Set the the line color to be "red"
# with no fill, and use the same data source again to link selections
p3 = figure(title="MPG vs. Displacement", **plot_config)
p3. circle("mpg", "displ", size="cyl", line_color="red", fill_color=None, source=source)

# gridplot(...) accepts nested lists of plot objects
p = gridplot([[p1, p2, p3]])

show(p)
In [1]:
from __future__ import division

import itertools

import numpy as np

from bokeh.plotting import ColumnDataSource, figure, output_notebook, show
from bokeh.models import HoverTool

# Create a set of tools to use
TOOLS="pan,wheel_zoom,box_zoom,reset,hover"

xx, yy = np.meshgrid(np.arange(0, 101, 4), np.arange(0, 101, 4))
x = xx.flatten()
y = yy.flatten()
N = len(x)
inds = [str(i) for i in np.arange(N)]
radii = np.random.random(size=N)*0.4 + 1.7
colors = [
    "#%02x%02x%02x" % (r, g, 150) for r, g in zip(np.floor(50+2*x), np.floor(30+2*y))
]

# create a new data field for the hover tool to interrogate. 
foo = list(itertools.permutations("abcdef"))[:N]
bar = np.random.normal(size=N)

# We need to put these data into a ColumnDataSource
source = ColumnDataSource(
    data=dict(
        x=x,
        y=y,
        radius=radii,
        colors=colors,
        bar=bar,
        foo=foo,
    )
)

# output notebook
output_notebook()

p = figure(title="Hoverful Scatter", tools=TOOLS)

# This is identical to the scatter plot, but adds the 'source' parameter
p.circle(x, y, radius=radii, source=source,
         fill_color=colors, fill_alpha=0.6, line_color=None)

# add a `text` renderer to display the index of each circle
# inside the circle
p.text(x, y, text=inds, alpha=0.5, text_font_size="5pt",
       text_baseline="middle", text_align="center")

# We want to add some fields for the hover tool to interrogate, but first we
# have to get ahold of the tool. We can use the 'select' method to do that.
hover = p.select(dict(type=HoverTool))

# add some new tooltip (name, value) pairs. Variables from the
# data source are available with a "@" prefix, e.g., "@x" will display the
# x value under the cursor. There are also some special known values that
# start with "$" symbol:
#   - $index     index of selected point in the data source
#   - $x, $y     "data" coordinates under cursor
#   - $sx, $sy   canvas coordinates under cursor
#   - $color     color data from data source, syntax: $color[options]:field_name
# NOTE: tooltips will show up in the order they are in the list
hover.tooltips = [
    # add to this
    ("index", "$index"),
    ("(x,y)", "($x, $y)"),
    ("radius", "@radius"),
    ("fill color", "$color[hex, swatch]:fill_color"),
    ("foo", "@foo"),
    ("bar", "@bar"),
]

show(p)
BokehJS successfully loaded.

rbokeh

R package interface for Bokeh

Installation

devtools::install_github("ramnathv/htmlwidgets")
devtools::install_github("bokeh/rbokeh")

Use

Plots are constructed by initializing a figure() and then adding layers on top.

Examples

In [13]:
library(rbokeh)
suppressMessages(library(dplyr))

p <- figure() %>%
  ly_points(Sepal.Length, Sepal.Width, data = iris,
    color = Species, glyph = Species,
    hover = list(Sepal.Length, Sepal.Width))

htmlwidgets:::toHTML(print(p))
xlim not specified explicitly... calculating...
ylim not specified explicitly... calculating...
Out[13]:

In [12]:
h <- figure(width = 600, height = 400) %>%
  ly_hist(eruptions, data = faithful, breaks = 40, freq = FALSE) %>%
  ly_density(eruptions, data = faithful)

htmlwidgets:::toHTML(print(h))
xlim not specified explicitly... calculating...
ylim not specified explicitly... calculating...
Out[12]:

In [14]:
library(maps)
data(world.cities)
caps <- subset(world.cities, capital == 1)
caps$population <- prettyNum(caps$pop, big.mark = ",")

map <- figure(width = 800, padding_factor = 0) %>%
  ly_map("world", col = "gray") %>%
  ly_points(long, lat, data = caps, size = 5,
    hover = c(name, country.etc, population))

htmlwidgets:::toHTML(print(map))
xlim not specified explicitly... calculating...
ylim not specified explicitly... calculating...
Out[14]:

Your turn...

Using either Bokeh or rbokeh, make series of plots that combine the map above with a dot plot showing the populations.

Bonus: Add linked hovering between then plots!

R Tip: Construct grids of plots using grid_plot