-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate VegaFusion into JupyterChart #3281
Conversation
@binste, when I run |
cc @domoritz, as this is something we've talked about at various times over the past couple of years |
Very cool. Super exciting to have out-of-the-box scalability with Altair through this feature. |
As always, thanks @jonmmease! To have all of this integrated smoothly within altair is really something! Foundational work! I tried the example Interactive Chart with Aggregation but it is not working with vegafusion enabled: import altair as alt
from vega_datasets import data
alt.data_transformers.enable("vegafusion")
source = data.movies.url
slider = alt.binding_range(min=0, max=10, step=0.1, name="threshold")
threshold = alt.param(value=5, bind=slider)
chart = alt.layer(
alt.Chart(source).mark_circle().encode(
x=alt.X("IMDB_Rating:Q").title("IMDB Rating"),
y=alt.Y("Rotten_Tomatoes_Rating:Q").title("Rotten Tomatoes Rating")
).transform_filter(
alt.datum["IMDB_Rating"] >= threshold
),
alt.Chart(source).mark_circle().encode(
x=alt.X("IMDB_Rating:Q").bin(maxbins=10),
y=alt.Y("Rotten_Tomatoes_Rating:Q").bin(maxbins=10),
size=alt.Size("count():Q").scale(domain=[0,160])
).transform_filter(
alt.datum["IMDB_Rating"] < threshold
),
alt.Chart().mark_rule(color="gray").encode(
strokeWidth=alt.StrokeWidth(value=6),
x=alt.X(datum=alt.expr(threshold.name), type="quantitative")
)
).add_params(threshold)
alt.JupyterChart(chart) It initiates correctly but upon using the slider it errors with: TraitError: The 'threshold' trait of a Params instance expected an int, not the float 5.3. The Interval Selection on a Map is also not working with vegafusion enabled, but I don't think that is actually related to this PR. |
Thanks for the kind words and for trying out the PR @mattijn. I'll take a look! |
I fixed the param value error in 8689751 and opened vega/vegafusion#434 to track the error you hit in the interval selection on map example.
This is a good idea. I'll add a |
(these will end up in the JupyterLab console)
Wow, so exciting to have this functionality directly in JupyterCharts! I'm in favor of this direction as I think this would make it more convenient to work with large data, and also simplify by having one less renderer and python package. If I remember correctly, in the past we briefly talked about having a global option for enabling JupyterChart as the default for all Altair charts, similar to how we have the data_transformers right now. Would this still be viable after this PR, or would it become difficult now that there is an interaction with the global vegafusion option? In either case, I think the functionality in this PR is more helpful than the potential global option. |
When we talked about this previously, I pictured adding a new renderer that would display charts using JupyterChart. Something like alt.renderers.enable("jupyter") This would be orthogonal to the VegaFusion data transformer used in this PR, so hypothetically you could do this to enable both: alt.renderers.enable("jupyter")
alt.data_transformers.enable("vegafusion") Does that make sense @joelostblom? |
Yup that makes sense; great that this works smoothly, I will check out the other PR you put up. |
I'm not sure what I'm missing, but if I checkout the latest changes and double check that I'm on the right branch including latest commits: !git log --oneline -5 ee2aed45 (HEAD -> jonmmease/vegafusion-widget, origin/jonmmease/vegafusion-widget) mypy fixes
893bc597 Add debug property and use this to enable printing VegaFusion messages (these will end up in the JupyterLab console)
8689751a Use float if initial param value is int
11a90569 Fix JupyterChart tests
ae373a9e bump vegafusion in pyproject.toml I don't get a Screen.Recording.2023-12-16.at.14.19.40.mov |
You're not missing anything, I see what you mean now. I'll dig into why this isn't updating |
By the way, does this mean that this PR makes it possible to set selections and update/stream data? I've the feeling this PR unlocks more possibilities than just integrating VegaFusion into the Altair JupyterChart. Or is this just wishful thinking? |
This PR makes it possible to update and listen to arbitrary signals and datasets in the Vega spec. So there are a lot more things that could be done with this. One example would be to update (though not stream) datasets in place. Streaming data (as in appending to the data that's already displayed) would require some additional work. Once caveat is that it wouldn't (currently) work to combine the VegaFusion integration with other arbitrary updates to the widget's signals and datasets. |
I updated the Large Datasets documentation to describe using JupyterChart and remove mention of VegaFusionWidget. |
@mattijn, this is ready for another look. I just released VegaFusion 1.5.1 which fixes the two chart errors you ran into. |
After updating vegafusion, I was getting this error: AttributeError: 'builtins.PyVegaFusionRuntime' object has no attribute 'new_chart_state' Checking versions of vegafusion: (stable) D:\Software\altair-viz\altair>conda list vegafusion
# packages in environment at D:\Software\Miniconda3\envs\stable:
#
# Name Version Build Channel
vegafusion 1.5.1 pyhd8ed1ab_0 conda-forge
vegafusion-python-embed 1.4.3 py310he2c049f_0 conda-forge After updating Thanks again @jonmmease! |
Yeah, that's a good idea. I opened #3296 to track this. I think I'll merge this on Monday if there isn't any more feedback. Thanks all! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The culmination of a lot of great work!! Thank you @jonmmease. This, together with the "jupyter" renderer makes it super easy for users to work with large datasets and explore them interactively. 🥳
Sorry that it took me a while to have a look, wasn't because I am not excited about this PR ;) I should have some more time over the holidays, just ping me when I can help out somewhere.
Quick question @jonmmease. You made this statement before:
I like to update a dataset in place. How does this work in practice? For example in this spec: import altair as alt
import pandas as pd
list_start = ['C4', 'D4', 'E4', 'E4']
list_update = ['C4', 'D4', 'E4', 'E4', 'D#4', 'C4', 'G4', 'D#5', 'C5', 'G5', 'C4']
df_start = pd.DataFrame({'notes': list_start})
df_update = pd.DataFrame({'notes': list_update})
# Create a chart using Altair
bar_chart = alt.Chart(df_start).mark_bar().encode(
x='notes:N',
y='count():O'
)
jchart = alt.JupyterChart(bar_chart)
jchart How can I update/replace the dataset here in place using the updated list |
Hi Mattijn, updating a dataset doesn't have a nice API yet, but using the primitives added in this PR you can do something like this: jchart._py_to_js_updates = [{
"namespace": "data",
"scope": [],
"name": "data-a0e7a86c692327a18bbeb2464725124c",
"value": df_update.to_dict("records")
}] Here namespace of "data" is in contrast to "signal", which can be used to update signals. "scope" of [] means the dataset is defined at the top level of the compiled vega spec (not nested inside a group mark) We could probably clean this up pretty well and add an jchart.update_data() method, at least for non-compound charts. We'd need to decide how to handle compound charts as well. |
Wow! This will become great! WIOD1ayTs3.mp4For fun: my used code: https://gist.github.com/mattijn/5d44cd9e261b90c4c2b92bb0d19bc171 |
Overview
This PR updates JupyterChart to improve the integration with VegaFusion so that interactive data transformations can be performed in the Python kernel rather than the browser. This brings the capabilities of the dedicated VegaFusion Widget Renderer to JupyterChart.
Benefits
Let's start with an example of interactive crossfiltering on a 2 million row flights dataset.
2m.mov
With this PR, the full dataset is never sent to the browser. Each time a selection is changed, a signal is sent from the widget to the Python kernel and the filtering and aggregation are performed in Python by VegaFusion, and the result is pushed back to the browser.
How it's enabled
As before, VegaFusion is enabled in Altair globally using
alt.data_transformers.enable("vegafusion")
. When enabled, JupyterChart will automatically take advantage of VegaFusion. When not enabled, JupyterChart will continue to function as before. VegaFusion is still an optional dependency.How it works
This PR takes advantage of the ChartState construct added to VegaFusion 1.5.0 (See vega/vegafusion#426). The ChartState performs the initial spec transformation and provides a watch plan which specifies the signals and datasets that must be sent from the vega renderer back to the ChartState in order to preserve chart interactivity.
The ChartState is also responsible for holding references to inline datasets. So unlike VegaFusion's VegaFusionWidget approach, source dataframes do not need to be written to disk 🎉
Note on local_tz
One subtlety is that the ChartState requires the browser's local timezone in order to perform its data transformations. Because of this, I added a
local_tz
traitlet that is set by the widget to the browser's local timezone. The Python side adds a callback on this traitlet and builds the ChartState onces it is available in Python.Selection / Param access
When VegaFusion is enabled, it's still possible to access the Chart's selections and value parameters.
Future of VegaFusionWidget
If these updates are accepted into JupyterChart, I plan on deprecating VegaFusionWidget (and the entire
vegafusion-jupyter
Python package).I'll update the Altair docs in a follow-on PR to remove mention of VegaFusionWidget and explain the functionality of JupyterChart when VegaFusion is enabled.