-
Notifications
You must be signed in to change notification settings - Fork 49
[Feature Request] visual diagnostic tools #1490
Comments
@horizon-blue & @feynmanliang I'm going to use this issue as the spot where we begin integrating the diagnostics tools. The issue contains all the closed issues and PRs for historical context, and I think we should use this issue for integrating the tools and any discussions that may occur. |
@feynmanliang here is a bit more context. As a broad overview, the goal is to use composition to create an accessor namespace outlined here in xarray, which is also how pandas creates accessors, to create an accessor From there, we create Bokeh apps that use the As an example, below is a gif showing 1D marginals for random variables in Bean Machine model using an Here is a gif of the tool in Jupyter. Note how we are using the And below is the tool in Docusaurus, where the page was created in a slimmed down version of the Bean Machine site. The component is reading data saved from the above Why would we repeat the tool in React/JavaScript if we already have it in Python? Simply for tutorial writting, and highlighting the fact that Bean Machine has a great user experience when it comes to the standard visual model diagnostics used for Bayesian modeling. The help tab is also a great place to highlight the research done to generate the plots, and also serves as a spot for the modeler to remind themselves of what the difference between rank plots and trace plots are. It can also serve as a spot for the derivations of the output, and this feature is mirrored in both Jupyter and Docusaurus. |
Also, the |
Maybe a dumb question, but how does this compare to Jupyter widgets (which can also be embedded into other contexts) and Bokeh's notebook compatibility and embedding functionality? Also, would other projects which use Bokeh + Jupyter + Docusaurus find this useful, and if so what would be the trade-offs/gains developing this in a package external to beanmachine? |
I have never tried embedding Jupyter widgets like that before and I appreciate you pointing it out. We are definitely using widgets for the tools, and they are from the Bokeh library. They are the things like the drop-down menu, the tabs, sliders, and divs showing bandwidth values. Everything else is a different Bokeh object, like the plots, the titles for plots, the layout etc. Docusaurus also requires no SSR (server side rendering) since it is a static site generator, which means that creating the Bokeh components required the Docusaurus
The major hurdle against using an embed with Bokeh for these tools was due to the fact that there are a lot of combinations a user could choose when selecting bandwidths, high-density-intervals, etc and we would have had to precomute all those combinations if we were to use a straight forward embed using Bokeh. If we had a server, then we would not have to do what we did in JavaScript. Finally, yes this would be useful in other projects as the way it was written was to ensure that we never polluted the namespace of the model object or touched its internal API. Below are trade-offs I've thought about and certainly welcome others to add their own. Obviously there is considerable overlap between the two headings below. Internal package
External package
This is an excellent design question that deserves further conversation, but my personal viewpoint would be to have it in Bean Machine and spin it out when it no longer makes sense to maintain it here. When I was making these tools I was thinking about when I was first learning about Bayesian modeling, and how visual diagnostics helped guide model updates and my own understanding of building and updating models. My personal opinion would be to leverage these tools so we can engage the broader community (since these are also great teaching visuals), and then upstream the tools to ArviZ, Bokeh, Jupyter, Docusaurus, or to their own repos after we have had the chance to showcase them in Bean Machine so people get exposed to Bean Machine and the visual tools together. |
Below is a high level outline of how the Bokeh apps are created for a Jupyter session. This is purely for reference. class GeneralDiagnosticToolOutline:
def __init__(self, idata: az.InferenceData) -> None:
"""We convert the Bean Machine MonteCarloSamples object to an ArviZ
InferenceData object when we add accessor capabilities to MCS objects. This is
why the __init__ takes an InferenceData object and not a MonteCarloSamples
object.
We also compute some standard values from the model like the number of chains
and draws in the model. We also set the HDI probability here for some tools,
which is defined by the rcParams in ArviZ.
The widgets used in all the tools require string representations of the
RVIdentifiers, which is why there are typically a list of rv_names and
rv_identifiers in each tool. We use these to determine how to select data from
the idata object, since the idata object uses RVIdentifiers as keys, and not
their string representations.
"""
self.idata = idata
self.rv_identifiers = list(self.idata["posterior"].data_vars)
self.rv_names = [str(rv_identifier) for rv_identifier in self.rv_identifiers]
self.num_chains = self.idata["posterior"].dims["chain"]
self.num_draws_single_chain = self.idata["posterior"].dims["draw"]
self.num_draws_all_chains = self.num_chains * self.num_draws_single_chain
self.hdi_prob = az.rcParams["stats.hdi_prob"]
...
def compute(self, rv_name: str, **kwargs) -> Dict:
"""Compute data for the tool using ArviZ. The compute function returns some form
of a dictionary that is used by other methods. Different tools will have
different dictionary objects, but all will return a JSON serializable object.
Depending on which tool the compute method is in will determine other keyword
arguments that are passed to the method. All methods take a single random
variable string representation except for the 2D marginal tool, which requires 2
RV names.
"""
rv_identifier = self.rv_identifiers[self.rv_names.index(rv_name)]
data = self.idata["posterior"][rv_identifier].values
...
def create_sources(self, rv_name: str, **kwargs) -> Dict:
"""Create Bokeh ColumnDataSource objects. Here we again ask for the Bean Machine
RVIdentifier string representation so that we can compute the data associated
for the RV in the tool. No caching of data is done, and every time a user
changes the RV shown in the tool, the data is recomputed. This method creates
the objects that Bokeh consumes for displaying data in the tool. The output is
again a dictionary, where the keys are sometimes chain labels and sometimes
different glyphs to be drawn in the figures.
"""
data = self.compute(rv_name=rv_name, **kwargs)
...
def create_figures(self, rv_name: str) -> Dict | Figure:
"""Create Bokeh figures. Here we create empty figures that have no data in them.
All figures are styled in the same way, and the method returns either a
dictionary of figure objects, or a single figure dependent on the tool being
displayed.
"""
fig = figure(...)
utils.style_figure(fig)
...
def create_glyphs(self) -> Dict:
"""Create Bokeh Glyph objects that will be bound to figures when data is bound
to the glyph.
It may seem odd to create empty glyphs with no data attached to them, and this
may change with newer version of Bokeh, but it was done for debugging purposes
when creating the tools as there is a well defined differentiation between what
a glyph is, and what other components of a figure are (see the annotations
method). If this layer of abstraction is found to be too complex, then we can
change it such that glyphs are drawn directly on figures with data attached to
them.
All tools use the same palette for colors, which is the color blind palette
found in Bokeh. This was chosen for accessibility reasons. We can update it such
that a user can choose a palette if they so wish, but I think defaulting to the
most accessible color palette available is a good first option.
"""
palette = utils.choose_palette(self.num_chains)
...
def create_annotations(self, figs: Dict[str, Figure], rv_name: str) -> Dict | Annotation:
"""Create Bokeh Annotation objects.
Annotations are things like spans, shaded regions, legends, or text on a figure.
Most methods in the various tools that have a create_annotations method return a
dictionary where the key is the figure name and the value is the annotation
drawn on the figure. Glyphs and annotations need to be redrawn as a user
interacts with the tool so these are updated with different callbacks defined
below.
Again, depending on the tool the parameters will either be a dictionary of
figures or a single figure.
"""
...
def add_tools(self, figs: Dict[str, Figure]) -> None:
"""Add Bokeh HoverTool objects to the figures.
This method directly manipulates a figure by adding hover tools to it. In order
to add hover tools, data has to be bound to a glyph or annotation, and the glyph
or annotation must be bound to a figure. The values displayed by the hover tools
are defined in the Bokeh ColumnDataSource objects defined above.
"""
...
def update_figure(self, rv_name: str, old_sources: Dict, figs: Dict, **kwargs) -> None:
"""Update the figures in the tool.
This is the main worker for the callbacks. Here we update the Bokeh
ColumnDataSource objects with new data, which is computed using the compute
method. Each tool updates itself differently, but the general rule of thumb here
is that each widget and figure is updated on user interaction.
"""
...
def help_page(self) -> Div:
"""Here is where we create the prose for the help tab found in all the tools.
Most of the prose has been culled from tutorials that have repeated descriptions
about diagnostics etc.
"""
text = """"""
div = Div(text=text, disable_math=False)
return div
def modify_doc(self, doc) -> None:
"""This is the essential piece to the Bokeh app as it is where we create Bokeh
sources, create figures, glyphs, annotations, attach tools to the figures,
create widgets, and callbacks for the widgets.
Most steps below occur for each of the tools. Differences arise when we have to
handle callbacks differently, or we need to recall a previous bandwidth value.
"""
# Set the initial view.
rv_name = self.rv_names[0]
# Create data source(s) for the figure(s).
sources = self.create_sources(rv_name)
# Create the figure(s).
figs = self.create_figures(rv_name)
# Create glyphs and add them to the figure(s).
glyphs = self.create_glyphs()
# Create annotations for the figure(s) and add them to their appropriate figure.
annotations = self.create_annotations(figs=figs, rv_name=rv_name)
# Create tooltips for the figure(s).
self.add_tools(figs=figs)
# Create the widget(s).
rv_select = Select(title="Query", value=rv_name, options=self.rv_names)
bw_slider = Slider(...)
# Callbacks for the widget(s).
def update_rv(attr, old, new):
"""All tools have this widget as each tool displays visual diagnostics for
the chosen random variable.
"""
self.update_figure(rv_name=new, old_sources=sources, figs=figs, **kwargs)
# Update other widgets if they exist in the tool.
def update_bw(attr, old, new):
# Callback for a different widget, as an example of multiple widgets in the
# tool.
...
# NOTE: Here is where we listen for changes to the widgets. When a user changes
# the value of a widget, then the change triggers the callback, and in
# some cases triggers the update_figure() method defined above. It just
# depends on the widget and the tool.
rv_select.on_change("value", update_rv)
bw_slider.on_change("value", update_bw)
# NOTE: We are using Bokeh's CustomJS model in order to reset the ranges of the
# figures. This is an odd quirk of a Bokeh app. If a user changes the view
# port of a figure and then changes the RV being displayed, then Bokeh
# sometimes will not reset the figure ranges in order to accommodate the
# new data. This is a work-around that fixes this issue, and is present in
# all the tools that need their figures reset due to user interactions.
rv_select.js_on_change(
"value",
CustomJS(args={fig}, code=("fig.reset.emit();\n")),
)
# Add widgets and figures to tabs, and then add them to a tool layout.
# Here we define the data to be shown in the tool's tabs, and the layout of the
# tool itself. Once the layout has been created, we add it to the doc that the
# Bokeh app is in.
tab = Panel(child=figs, title="tool tab")
layout = column(...)
tool_panel = Panel(child=_layout, title="tool")
help_panel = Panel(child=self.help_page(), title="Help")
tabs = Tabs(tabs=[tool_panel, help_panel])
doc.add_root(tabs)
def show_widget(self) -> None:
"""Display the widget. This is just syntactic sugar. A more appropriate name for
it could be display_tool. Its main use was for debugging, and to have all Bokeh
apps have the same API output.
"""
show(self.modify_doc) |
This commit adds the feature to run diagnostic tools using ArviZ and Bokeh in a Jupyter environment. - Added a `tools` sub-package in the `diagnostics` package. This new sub-package adds the following files, each for a specific tool that runs ArviZ model diagnostics. - autocorrelation - effective_sample_size - marginal1d - marginal2d - trace - The listed tools above also have two corresponding files, one for types and the other for methods used in the tool. Resolves #1490
Issue Description
Integrate visual diagnostics tools into Bean Machine.
Previous issues & PRs
readme.md
#1337 README.md for tutorials (closed)The text was updated successfully, but these errors were encountered: