Integrate VegaFusion into JupyterChart #3281

jonmmease · 2023-12-12T23:36:24Z

Overview

This PR updates JupyterChart to improve the integration with VegaFusion so that interactive data transformations can be performed in the Python kernel rather than the browser. This brings the capabilities of the dedicated VegaFusion Widget Renderer to JupyterChart.

Benefits

Let's start with an example of interactive crossfiltering on a 2 million row flights dataset.

import altair as alt
import pandas as pd
from vega_datasets import data

alt.data_transformers.enable("vegafusion")

# Load data
source = pd.concat([data.flights_2k()] * 1000, axis=0)
# source = pd.concat([data.flights_2k()] * 10, axis=0)
print(f"{len(source)} rows")

# Build crossfiltered chart
brush = alt.selection_interval(encodings=['x'])

# Define the base chart, with the common parts of the
# background and highlights
base = alt.Chart(width=160, height=130).mark_bar().encode(
    x=alt.X(alt.repeat('column')).bin(maxbins=20),
    y='count()'
)

# gray background with selection
background = base.encode(
    color=alt.value('#ddd')
).add_params(brush)

# blue highlights on the transformed data
highlight = base.transform_filter(brush)

# layer the two charts & repeat
chart = alt.layer(
    background,
    highlight,
    data=source
).transform_calculate(
    "time",
    "hours(datum.date)"
).repeat(column=["distance", "delay", "time"])

alt.jupyter.JupyterChart(chart)

2m.mov

With this PR, the full dataset is never sent to the browser. Each time a selection is changed, a signal is sent from the widget to the Python kernel and the filtering and aggregation are performed in Python by VegaFusion, and the result is pushed back to the browser.

How it's enabled

As before, VegaFusion is enabled in Altair globally using alt.data_transformers.enable("vegafusion"). When enabled, JupyterChart will automatically take advantage of VegaFusion. When not enabled, JupyterChart will continue to function as before. VegaFusion is still an optional dependency.

How it works

This PR takes advantage of the ChartState construct added to VegaFusion 1.5.0 (See vega/vegafusion#426). The ChartState performs the initial spec transformation and provides a watch plan which specifies the signals and datasets that must be sent from the vega renderer back to the ChartState in order to preserve chart interactivity.

The ChartState is also responsible for holding references to inline datasets. So unlike VegaFusion's VegaFusionWidget approach, source dataframes do not need to be written to disk 🎉

Note on local_tz

One subtlety is that the ChartState requires the browser's local timezone in order to perform its data transformations. Because of this, I added a local_tz traitlet that is set by the widget to the browser's local timezone. The Python side adds a callback on this traitlet and builds the ChartState onces it is available in Python.

Selection / Param access

When VegaFusion is enabled, it's still possible to access the Chart's selections and value parameters.

Future of VegaFusionWidget

If these updates are accepted into JupyterChart, I plan on deprecating VegaFusionWidget (and the entire vegafusion-jupyter Python package).

I'll update the Altair docs in a follow-on PR to remove mention of VegaFusionWidget and explain the functionality of JupyterChart when VegaFusion is enabled.

jonmmease · 2023-12-12T23:41:40Z

@binste, when I run hatch run mypy altair tests locally I'm not seeing any mypy issues. Do you have any ideas on what I might need to do to reproduce the failures in lint / ruff-mypy locally?

jonmmease · 2023-12-13T15:20:04Z

cc @domoritz, as this is something we've talked about at various times over the past couple of years

domoritz · 2023-12-13T15:23:55Z

Very cool. Super exciting to have out-of-the-box scalability with Altair through this feature.

mattijn · 2023-12-13T17:53:50Z

As always, thanks @jonmmease! To have all of this integrated smoothly within altair is really something! Foundational work!
Is it possible to enter a kind of debug mode? Logging or printing what is being queried within vegafusion upon interacting with a visualisation?

I tried the example Interactive Chart with Aggregation but it is not working with vegafusion enabled:

import altair as alt
from vega_datasets import data
alt.data_transformers.enable("vegafusion")

source = data.movies.url

slider = alt.binding_range(min=0, max=10, step=0.1, name="threshold")
threshold = alt.param(value=5, bind=slider)

chart = alt.layer(
    alt.Chart(source).mark_circle().encode(
        x=alt.X("IMDB_Rating:Q").title("IMDB Rating"),
        y=alt.Y("Rotten_Tomatoes_Rating:Q").title("Rotten Tomatoes Rating")
    ).transform_filter(
        alt.datum["IMDB_Rating"] >= threshold
    ),

    alt.Chart(source).mark_circle().encode(
        x=alt.X("IMDB_Rating:Q").bin(maxbins=10),
        y=alt.Y("Rotten_Tomatoes_Rating:Q").bin(maxbins=10),
        size=alt.Size("count():Q").scale(domain=[0,160])
    ).transform_filter(
        alt.datum["IMDB_Rating"] < threshold
    ),

    alt.Chart().mark_rule(color="gray").encode(
        strokeWidth=alt.StrokeWidth(value=6),
        x=alt.X(datum=alt.expr(threshold.name), type="quantitative")
    )
).add_params(threshold)

alt.JupyterChart(chart)

It initiates correctly but upon using the slider it errors with:

TraitError: The 'threshold' trait of a Params instance expected an int, not the float 5.3.

The Interval Selection on a Map is also not working with vegafusion enabled, but I don't think that is actually related to this PR.

jonmmease · 2023-12-13T18:49:31Z

Thanks for the kind words and for trying out the PR @mattijn. I'll take a look!

jonmmease · 2023-12-13T19:41:17Z

I fixed the param value error in 8689751 and opened vega/vegafusion#434 to track the error you hit in the interval selection on map example.

Is it possible to enter a kind of debug mode? Logging or printing what is being queried within vegafusion upon interacting with a visualisation?

This is a good idea. I'll add a verbose flag that logs out the variable values sent between the Python kernel and the widget.

(these will end up in the JupyterLab console)

jonmmease · 2023-12-13T23:04:20Z

In 893bc59 I added a debug flag to JupyterChart. When it's True, the VegaFusion messages are printed, and in JupyterLab they end up in the log pane like this:

In the future we could log more things as well. See if that makes sense @mattijn

joelostblom · 2023-12-13T23:20:58Z

Wow, so exciting to have this functionality directly in JupyterCharts! I'm in favor of this direction as I think this would make it more convenient to work with large data, and also simplify by having one less renderer and python package.

If I remember correctly, in the past we briefly talked about having a global option for enabling JupyterChart as the default for all Altair charts, similar to how we have the data_transformers right now. Would this still be viable after this PR, or would it become difficult now that there is an interaction with the global vegafusion option? In either case, I think the functionality in this PR is more helpful than the potential global option.

jonmmease · 2023-12-13T23:34:18Z

If I remember correctly, in the past we briefly talked about having a global option for enabling JupyterChart as the default for all Altair charts

When we talked about this previously, I pictured adding a new renderer that would display charts using JupyterChart. Something like

alt.renderers.enable("jupyter")

This would be orthogonal to the VegaFusion data transformer used in this PR, so hypothetically you could do this to enable both:

alt.renderers.enable("jupyter")
alt.data_transformers.enable("vegafusion")

Does that make sense @joelostblom?

joelostblom · 2023-12-14T17:45:41Z

Yup that makes sense; great that this works smoothly, I will check out the other PR you put up.

mattijn · 2023-12-16T13:23:41Z

I'm not sure what I'm missing, but if I checkout the latest changes and double check that I'm on the right branch including latest commits:

!git log --oneline -5

ee2aed45 (HEAD -> jonmmease/vegafusion-widget, origin/jonmmease/vegafusion-widget) mypy fixes
893bc597 Add debug property and use this to enable printing VegaFusion messages (these will end up in the JupyterLab console)
8689751a Use float if initial param value is int
11a90569 Fix JupyterChart tests
ae373a9e bump vegafusion in pyproject.toml

I don't get a TraitError, using the spec from #3281 (comment). But the chart is not updating correctly and I don't see the debug logging in the log console.

Screen.Recording.2023-12-16.at.14.19.40.mov

altair/jupyter/js/index.js

jonmmease · 2023-12-16T13:34:54Z

I'm not sure what I'm missing

You're not missing anything, I see what you mean now. I'll dig into why this isn't updating

mattijn · 2023-12-16T13:38:44Z

By the way, does this mean that this PR makes it possible to set selections and update/stream data? I've the feeling this PR unlocks more possibilities than just integrating VegaFusion into the Altair JupyterChart. Or is this just wishful thinking?

jonmmease · 2023-12-16T14:16:53Z

By the way, does this mean that this PR makes it possible to set selections and update/stream data? I've the feeling this PR unlocks more possibilities than just integrating VegaFusion into the Altair JupyterChart. Or is this just wishful thinking?

This PR makes it possible to update and listen to arbitrary signals and datasets in the Vega spec. So there are a lot more things that could be done with this. One example would be to update (though not stream) datasets in place. Streaming data (as in appending to the data that's already displayed) would require some additional work.

Once caveat is that it wouldn't (currently) work to combine the VegaFusion integration with other arbitrary updates to the widget's signals and datasets.

jonmmease · 2023-12-16T15:57:00Z

I updated the Large Datasets documentation to describe using JupyterChart and remove mention of VegaFusionWidget.

…offline support)

jonmmease · 2023-12-19T14:36:41Z

@mattijn, this is ready for another look. I just released VegaFusion 1.5.1 which fixes the two chart errors you ran into.

mattijn · 2023-12-22T11:55:20Z

After updating vegafusion, I was getting this error:

AttributeError: 'builtins.PyVegaFusionRuntime' object has no attribute 'new_chart_state'

Checking versions of vegafusion:

(stable) D:\Software\altair-viz\altair>conda list vegafusion
# packages in environment at D:\Software\Miniconda3\envs\stable:
#
# Name                    Version                   Build  Channel
vegafusion                1.5.1              pyhd8ed1ab_0    conda-forge
vegafusion-python-embed   1.4.3           py310he2c049f_0    conda-forge

After updating vegafusion-python-embed as well, all works. Maybe we should introduce a check for this that these packages are in sync?

Thanks again @jonmmease!

jonmmease · 2023-12-22T12:04:00Z

Maybe we should introduce a check for this that these packages are in sync?

Yeah, that's a good idea. I opened #3296 to track this.

I think I'll merge this on Monday if there isn't any more feedback. Thanks all!

binste

The culmination of a lot of great work!! Thank you @jonmmease. This, together with the "jupyter" renderer makes it super easy for users to work with large datasets and explore them interactively. 🥳

Sorry that it took me a while to have a look, wasn't because I am not excited about this PR ;) I should have some more time over the holidays, just ping me when I can help out somewhere.

mattijn · 2024-01-22T21:33:00Z

Quick question @jonmmease. You made this statement before:

This PR makes it possible to update and listen to arbitrary signals and datasets in the Vega spec. So there are a lot more things that could be done with this. One example would be to update (though not stream) datasets in place.

I like to update a dataset in place. How does this work in practice? For example in this spec:

import altair as alt
import pandas as pd

list_start = ['C4', 'D4', 'E4', 'E4']
list_update = ['C4', 'D4', 'E4', 'E4', 'D#4', 'C4', 'G4', 'D#5', 'C5', 'G5', 'C4']

df_start = pd.DataFrame({'notes': list_start})
df_update = pd.DataFrame({'notes': list_update})

# Create a chart using Altair
bar_chart = alt.Chart(df_start).mark_bar().encode(
    x='notes:N',
    y='count():O'
)
jchart = alt.JupyterChart(bar_chart)
jchart

How can I update/replace the dataset here in place using the updated list list_update or dataframe df_update?

jonmmease · 2024-01-22T22:09:20Z

Hi Mattijn, updating a dataset doesn't have a nice API yet, but using the primitives added in this PR you can do something like this:

jchart._py_to_js_updates = [{
    "namespace": "data",
    "scope": [],
    "name": "data-a0e7a86c692327a18bbeb2464725124c",
    "value": df_update.to_dict("records")
}]

Here "data-a0e7a86c692327a18bbeb2464725124c" is the auto-generated dataset name that I found by viewing the spec in the Vega editor.

namespace of "data" is in contrast to "signal", which can be used to update signals.

"scope" of [] means the dataset is defined at the top level of the compiled vega spec (not nested inside a group mark)

We could probably clean this up pretty well and add an jchart.update_data() method, at least for non-compound charts. We'd need to decide how to handle compound charts as well.

mattijn · 2024-01-22T22:25:21Z

Wow! This will become great!
I connected a piano to a jupyter notebook and now I have the chart updating while playing some notes:

WIOD1ayTs3.mp4

For fun: my used code: https://gist.github.com/mattijn/5d44cd9e261b90c4c2b92bb0d19bc171

jonmmease added 8 commits December 9, 2023 06:41

Enhance JupyterChart to handle timezone and None charts

6964727

Update JupyterChart to support VegaFusion ChartState

6230680

Bump minimum vegafusion version to 1.5.0 for ChartState support

cda4eb7

Add max_wait option to JupyterChart

9147cbb

Improve type hints

00f3701

Help mypy

583b6aa

bump vegafusion in pyproject.toml

ae373a9

Fix JupyterChart tests

11a9056

Use float if initial param value is int

8689751

Add debug property and use this to enable printing VegaFusion messages

893bc59

(these will end up in the JupyterLab console)

mypy fixes

ee2aed4

jonmmease mentioned this pull request Dec 14, 2023

Add "jupyter" renderer based on JupyterChart #3283

Merged

mattijn reviewed Dec 16, 2023

View reviewed changes

altair/jupyter/js/index.js Show resolved Hide resolved

jonmmease mentioned this pull request Dec 16, 2023

Support threshold aggregation example vega/vegafusion#436

Merged

Update Large Dataset documentation with JupyterChart usage

41baec0

jonmmease added 2 commits December 16, 2023 13:49

Use built-in structuredClone to avoid deepClone dependency

5a4bdd1

Rename imports to match vl-convert's bundling convention (for future …

059c6a5

…offline support)

jonmmease mentioned this pull request Dec 17, 2023

not working with VegaFusionWidget: Error: [object ArrayBuffer] is not serializable bokeh/ipywidgets_bokeh#46

Open

jonmmease mentioned this pull request Dec 22, 2023

Verify versions of both VegaFusion packages #3296

Closed

binste approved these changes Dec 23, 2023

View reviewed changes

jonmmease merged commit ebf9da5 into main Dec 26, 2023
20 checks passed

mattijn mentioned this pull request Jan 24, 2024

public facing method to update a dataset in place #3318

Open

joelostblom mentioned this pull request Oct 4, 2024

Discussion: Suggestions on best practices to include in PR chapter rostools/git4cats#48

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate VegaFusion into JupyterChart #3281

Integrate VegaFusion into JupyterChart #3281

jonmmease commented Dec 12, 2023

jonmmease commented Dec 12, 2023

jonmmease commented Dec 13, 2023

domoritz commented Dec 13, 2023

mattijn commented Dec 13, 2023 •

edited

Loading

jonmmease commented Dec 13, 2023

jonmmease commented Dec 13, 2023

jonmmease commented Dec 13, 2023

joelostblom commented Dec 13, 2023

jonmmease commented Dec 13, 2023

joelostblom commented Dec 14, 2023

mattijn commented Dec 16, 2023

jonmmease commented Dec 16, 2023

mattijn commented Dec 16, 2023

jonmmease commented Dec 16, 2023

jonmmease commented Dec 16, 2023

jonmmease commented Dec 19, 2023

mattijn commented Dec 22, 2023

jonmmease commented Dec 22, 2023

binste left a comment

mattijn commented Jan 22, 2024 •

edited

Loading

jonmmease commented Jan 22, 2024

mattijn commented Jan 22, 2024

Integrate VegaFusion into JupyterChart #3281

Integrate VegaFusion into JupyterChart #3281

Conversation

jonmmease commented Dec 12, 2023

Overview

Benefits

How it's enabled

How it works

Note on local_tz

Selection / Param access

Future of VegaFusionWidget

jonmmease commented Dec 12, 2023

jonmmease commented Dec 13, 2023

domoritz commented Dec 13, 2023

mattijn commented Dec 13, 2023 • edited Loading

jonmmease commented Dec 13, 2023

jonmmease commented Dec 13, 2023

jonmmease commented Dec 13, 2023

joelostblom commented Dec 13, 2023

jonmmease commented Dec 13, 2023

joelostblom commented Dec 14, 2023

mattijn commented Dec 16, 2023

jonmmease commented Dec 16, 2023

mattijn commented Dec 16, 2023

jonmmease commented Dec 16, 2023

jonmmease commented Dec 16, 2023

jonmmease commented Dec 19, 2023

mattijn commented Dec 22, 2023

jonmmease commented Dec 22, 2023

binste left a comment

Choose a reason for hiding this comment

mattijn commented Jan 22, 2024 • edited Loading

jonmmease commented Jan 22, 2024

mattijn commented Jan 22, 2024

mattijn commented Dec 13, 2023 •

edited

Loading

mattijn commented Jan 22, 2024 •

edited

Loading