Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python!): Use Altair in DataFrame.plot #17995

Merged
merged 50 commits into from
Aug 27, 2024
Merged

Conversation

MarcoGorelli
Copy link
Collaborator

@MarcoGorelli MarcoGorelli commented Aug 1, 2024

Some context behind this: since vega/altair#3452, Altair support Polars natively, without any extra heavy dependencies (no pandas, no NumPy, no PyArrow). Altair is a very popular and widely used library, with excellent docs and static typing - hence, I think it'd be best suited as Polars' default plotting backend

DataFrame.plot was marked as "unstable" so this change can technically be made in Polars 1.4.0 1.5.0. What I've implemented here is a very thin layer on top of Altair, so it should be both convenient to users and easy to maintain

For existing users wishing to preserve HvPlot plots, all they need to do is apply the diff

+ import hvplot.polars
- df.plot.line
+ df.hvplot.line

So, the impact on users should be fairly small

HvPlot maintainers have been extra-friendly have helpful (especially with answering user questions in Discord). I think it'd be good to still mention them in the docstring (and also to help users for whom this represents an API change), and recommend their library in the "visualisation" section of the user guide

Demo

DataFrame (here source is a polars.DataFrame):

image

image

Series plots work too:

image

Screenshot 2024-08-18 184932

Screenshot 2024-08-18 184917

Tab-complete works well, making this well-suited to EDA:

image

TODO

  • add more definitions than just line and point, so users get good tab completion
  • figure out static typing done

F.A.Q.: what about other plotting backends?

Maybe, in the future, the plotting backend could be configurable in pl.Config. But I think that's an orthogonal issue and can be done/discussed separately. Plotting will stay "unstable" for the time being

Comment on lines 21 to 30
def line(
self,
x: str | Any | None = None,
y: str | Any | None = None,
color: str | Any | None = None,
order: str | Any | None = None,
tooltip: str | Any | None = None,
*args: Any,
**kwargs: Any,
) -> alt.Chart:
Copy link
Collaborator Author

@MarcoGorelli MarcoGorelli Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dangotbanned - may I ask for your input here please?

  1. which do you think are the most common types of plots which are worth explicitly making functions for? Functionality would be unaffected, they would just work better with tab completion
  2. how would you suggest typing the various arguments? Does Altair have public type hints?
  3. Any Altair maintainers you'd suggest looping into the discussion?

Thanks 🙏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping, happy to help where I can @MarcoGorelli

Couple of resources up top that I think could be useful:

Will respond each question in another comment 👍

Copy link
Contributor

@dangotbanned dangotbanned Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1

  1. which do you think are the most common types of plots which are worth explicitly making functions for? Functionality would be unaffected, they would just work better with tab completion

Can't speak for everyone, but for a reduced selection:

Looking at hvPlot, there are a few methods/chart types I'd need to do some digging to work out the equivalent in altair (if there is one).

However, my suggestion would be using the names defined there, both for compatibility when switching backends and to reduce the number of methods.

Examples

Haven't covered everything here, but it's a start:

hvPlotTabular -> altair.Chart

  • (bar|barh) -> mark_bar
  • box -> mark_boxplot
  • scatter -> mark_(circle|point|square|image|text)
    • labels -> mark_text
    • points -> mark_point
  • line -> mark_(line|trail)
  • (polygons|paths) -> mark_geoshape
  • (area|heatmap) -> mark_area

Copy link
Contributor

@dangotbanned dangotbanned Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2

  1. how would you suggest typing the various arguments? Does Altair have public type hints?

I might update this later after thinking on it some more.

Yeah they've been there since 5.2.0 but will be improved for altair>=5.4.0 with https://github.com/vega/altair/blob/main/altair/vegalite/v5/schema/_typing.py

For altair the model is quite different to matplotlib-style functions, but .encode() would be where to start.

Something like:

# Annotation from `.encode()`
# y: Optional[str | Y | Map | YDatum | YValue] = Undefined

# Don't name it this pls
TypeForY = str | Mapping[str, Any] | Any

I wouldn't worry about any altair-specific types here.
Spelling them out won't have an impact on attribute access of the result

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3

  1. Any Altair maintainers you'd suggest looping into the discussion?

For typing @binste but really anyone from vega/altair#3452 I think would be interested (time-permitting)

@mattijn, @joelostblom, @jonmmease

@dangotbanned
Copy link
Contributor

dangotbanned commented Aug 1, 2024

2

  1. how would you suggest typing the various arguments? Does Altair have public type hints?

I might update this later after thinking on it some more.

Back again after thinking @MarcoGorelli

Feel free to rename things, but I came up with this for the typing

Super long code block
from __future__ import annotations

from typing import TYPE_CHECKING, Any, Mapping, Union

from typing_extensions import TypeAlias, TypedDict, Unpack

if TYPE_CHECKING:
    import altair as alt
    import narwhals.stable.v1 as nw

ChannelType: TypeAlias = Union[str, Mapping[str, Any], Any]

class EncodeKwds(TypedDict, total=False):
    angle: ChannelType
    color: ChannelType
    column: ChannelType
    description: ChannelType
    detail: ChannelType | list[Any]
    facet: ChannelType
    fill: ChannelType
    fillOpacity: ChannelType
    href: ChannelType
    key: ChannelType
    latitude: ChannelType
    latitude2: ChannelType
    longitude: ChannelType
    longitude2: ChannelType
    opacity: ChannelType
    order: ChannelType | list[Any]
    radius: ChannelType
    radius2: ChannelType
    row: ChannelType
    shape: ChannelType
    size: ChannelType
    stroke: ChannelType
    strokeDash: ChannelType
    strokeOpacity: ChannelType
    strokeWidth: ChannelType
    text: ChannelType
    theta: ChannelType
    theta2: ChannelType
    tooltip: ChannelType | list[Any]
    url: ChannelType
    x: ChannelType
    x2: ChannelType
    xError: ChannelType
    xError2: ChannelType
    xOffset: ChannelType
    y: ChannelType
    y2: ChannelType
    yError: ChannelType
    yError2: ChannelType
    yOffset: ChannelType


class Plot:
    chart: alt.Chart

    def __init__(self, df: nw.DataFrame) -> None:
        import altair as alt

        self.chart = alt.Chart(df)

    def line(
        self,
        x: ChannelType | None = None,
        y: ChannelType | None = None,
        color: ChannelType | None = None,
        order: ChannelType | list[Any] | None = None,
        tooltip: ChannelType | list[Any] | None = None,
        /,
        **kwargs: Unpack[EncodeKwds],
    ) -> alt.Chart: ...

Which checks out below.

You can use x, y, color, order, tooltip as positional-only or keyword-only, but not both:

def test_plot_typing() -> None:
    from typing import cast
    from typing_extensions import reveal_type

    plot = cast(Plot, "test")
    reveal_type(plot) # Type of "plot" is "Plot"

    example_1 = plot.line(x="col 1")
    reveal_type(example_1) # Type of "example_1" is "Chart"

    example_2 = plot.line("col 1", "col 2")
    reveal_type(example_2) # Type of "example_2" is "Chart"

    example_err = plot.line("col 1", "col 2", x="col 3")
    reveal_type(example_err) # Type of "example_err" is "Any"

At least for VSCode, you get the expanded docs on hover:

image


You could then repeat the /, **kwargs: Unpack[EncodeKwds] for the other methods - maybe changing the positional-only ones if needed

Copy link

codecov bot commented Aug 1, 2024

Codecov Report

Attention: Patch coverage is 46.39175% with 52 lines in your changes missing coverage. Please review.

Project coverage is 79.79%. Comparing base (6f5851d) to head (40a0e31).
Report is 5 commits behind head on main.

Files Patch % Lines
py-polars/polars/dataframe/plotting.py 38.00% 16 Missing and 15 partials ⚠️
py-polars/polars/series/plotting.py 51.42% 11 Missing and 6 partials ⚠️
py-polars/polars/dataframe/frame.py 60.00% 1 Missing and 1 partial ⚠️
py-polars/polars/series/series.py 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17995      +/-   ##
==========================================
- Coverage   79.80%   79.79%   -0.01%     
==========================================
  Files        1497     1499       +2     
  Lines      200379   200464      +85     
  Branches     2841     2864      +23     
==========================================
+ Hits       159913   159966      +53     
- Misses      39941    39952      +11     
- Partials      525      546      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@binste binste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping! I'm of course already a big fan of this PR ;) Let me know if I can help!

I find @dangotbanned's suggestion regarding typing a reasonable compromise so that there are some useful type hints but using the Any escape hatch instead of typing out all of them explicitly. However, if you fully want to mirror the type hints in altair with all altair-specific classes, I think we could expose those. Maybe something like altair.typing.XChannelType, ...

py-polars/polars/dataframe/plotting.py Outdated Show resolved Hide resolved
py-polars/polars/dataframe/plotting.py Outdated Show resolved Hide resolved
py-polars/polars/dataframe/plotting.py Outdated Show resolved Hide resolved
@MarcoGorelli
Copy link
Collaborator Author

MarcoGorelli commented Aug 2, 2024

Thanks both for comments!

One slight hesitation I have about adding such a large class as EncodeKwds is its maintainability - how would Polars ensure it stays up-to-date? Or would it be in-scope for Altair to expose such a class, that Polars could just import?

@binste
Copy link

binste commented Aug 2, 2024

So far we did not explicitly expose the types in Altair, even keeping many of them in private modules, as we first wanted to use them for a while before others rely on it. But I think now, also thanks to the recent improvements done by @dangotbanned, we could expose the most relevant ones. This could include a TypedDict such as EncodeKwds with each channel being typed the same as .encode(). This part of the code is autogenerated anyway based on the Vega-Lite jsonschema and so it would be no maintenance effort for us and for you :)

Any thoughts on this @dangotbanned? I think I could spend some time this weekend to think through which types we'd want to expose and how but I'd also very much appreciate your input and help if you want to. Maybe we can even get it into Altair 5.4 and release that in the next 1-2 weeks and so Polars 1.4 could have that as a minimum dependency and use the types.

@dangotbanned
Copy link
Contributor

dangotbanned commented Aug 2, 2024

So far we did not explicitly expose the types in Altair, even keeping many of them in private modules, as we first wanted to use them for a while before others rely on it. But I think now, also thanks to the recent improvements done by @dangotbanned, we could expose the most relevant ones. This could include a TypedDict such as EncodeKwds with each channel being typed the same as .encode(). This part of the code is autogenerated anyway based on the Vega-Lite jsonschema and so it would be no maintenance effort for us and for you :)

Any thoughts on this @dangotbanned?

Fully agree on the autogenerated TypedDict @binste, you beat me to the suggestion 😉

I think I could spend some time this weekend to think through which types we'd want to expose and how but I'd also very much appreciate your input and help if you want to. Maybe we can even get it into Altair 5.4 and release that in the next 1-2 weeks and so Polars 1.4 could have that as a minimum dependency and use the types.

Happy to discuss in an altair issue and work with you on a PR


One slight hesitation I have about adding such a large class as EncodeKwds is its maintainability - how would Polars ensure it stays up-to-date? Or would it be in-scope for Altair to expose such a class, that Polars could just import?

@MarcoGorelli
I think this approach would work better if Plot (or a version of) were a Protocol, that altair and any other library could handle the implementation of.

Maybe with fewer positional-args, but this was what I had in mind back in vega/altair#3452 (comment)

Code block
from __future__ import annotations

import sys
from typing import Any, Generic, TypeVar

import narwhals.stable.v1 as nw

if sys.version_info >= (3, 12):
    from typing import Protocol, runtime_checkable
else:
    from typing_extensions import Protocol, runtime_checkable

T_Plot = TypeVar("T_Plot")

@runtime_checkable
class SupportsPlot(Generic[T_Plot], Protocol):
    chart: T_Plot

    def __init__(self, df: nw.DataFrame) -> None: ...

    def bar(
        self,
        x: Any | None = None,
        y: Any | None = None,
        color: Any | None = None,
        tooltip: Any | None = None,
        /,
        **kwargs: Any,
    ) -> T_Plot: ...
    def line(
        self,
        x: Any | None = None,
        y: Any | None = None,
        color: Any | None = None,
        order: Any | None = None,
        tooltip: Any | None = None,
        /,
        **kwargs: Any,
    ) -> T_Plot: ...

So on the polars-side, you can focus more on switching between backends and less on maintaining how the plots are made.

It would also allow decisions like #17995 (comment) to be made in altair, since there may be non-trivial ways to produce some of the hvPlot charts - that some of the maintainers are aware of

@MarcoGorelli
Copy link
Collaborator Author

Ooh i like where things are going 😍

I think this approach would work better if Plot (or a version of) were a Protocol

This sounds good, we just need to be careful to learn lessons from the pandas plotting backends and why altair-pandas was abandoned. I think it was probably because:

  • the API there was too tied to pandas' existing plotting API, which is a bit at odds with how most dataframe libraries handle plots
  • it was never clear what the boundaries were - if you switched backend, how could you know what would be supported and what wouldn't?

that altair and any other library could handle the implementation of

How would that work? In this PR we're essentially deferring the whole implementation to Altair - if you have time/interest, do you fancy opening a separate PR to show how it would work? If you'd like to talk things over (which may be a good idea if we're coordinating changes across projects, which is never easy), feel free to book some time at https://calendly.com/marcogorelli

@mattijn
Copy link

mattijn commented Aug 2, 2024

Implementation-wise, I cannot contribute much and while not involved I have been following historical developments in pandas and Vega-Altair from the sidelines:

  • There have been requests to introduce methods in Altair that could be used in other packages but had no usage within Altair itself. This approach should be avoided here as well. This issue is summarized in this comment: pandas-dev/pandas#27488, and also discussed in pandas-dev/pandas#26747.
  • If we can update Altair to define methods that are both useful for Altair and beneficial for other packages, we should consider introducing them in Altair.
  • Entrypoints could be utilized by polars to introduce a method to support various plotting backends. Altair supports this functionality. But honestly, I don't know the details. See also https://github.com/altair-viz/altair_pandas.
  • Regarding the types of plots, there are experiments and research in creating common plots and interactivity approaches for exploratory data analysis for Altair. @joelostblom is/was working on this within Altair Ally, see https://github.com/vega/altair_ally, and @dwootton is/was exploring this with Altair Express.
  • From my observations, it is still challenging to define a default setting that would be sensible for most users.
  • How customizable are the results? For example, there have been questions within pdvega about how to add horizontal or vertical rules to the resulting plot (similar to ax.hlines and ax.vlines in Matplotlib), see How to add vertical and horizontal lines to figures? altair-viz/pdvega#21 (comment). Is that possible with the resulting charts here? Additionally, can the interactivity be turned off?

@dangotbanned
Copy link
Contributor

dangotbanned commented Aug 2, 2024

@MarcoGorelli @mattijn Really appreciate your thorough responses, I'll do my homework reading up on all of your links and follow up

For now, I can say I'd want to go in with the most minimal + simple definition for a Protocol - nowhere close to #17995 (comment)

  • No default implementation
  • No requirement of inheritance/ABCs
  • Purely defining a structural type
    • with some methods that all return the library equivalent of Chart
    • E.g. plotly.Figure

For altair, all I'd want is a quick way to get from df -> chart and then customize from there:

import altair as alt
import polars as pl

df = pl.DataFrame()
...
chart = alt.Chart(df).mark_line().encode(...) # <----

@dangotbanned
Copy link
Contributor

dangotbanned commented Aug 2, 2024

TLDR: Simple idea got complex 👎

Edit: Leaving this here for future reference, but no longer pushing for this path.
Skip to #17995 (comment)

Ooh i like where things are going 😍

I think this approach would work better if Plot (or a version of) were a Protocol

This sounds good, we just need to be careful to learn lessons from the pandas plotting backends and why altair-pandas was abandoned. I think it was probably because:

  • the API there was too tied to pandas' existing plotting API, which is a bit at odds with how most dataframe libraries handle plots

  • it was never clear what the boundaries were - if you switched backend, how could you know what would be supported and what wouldn't?

that altair and any other library could handle the implementation of

How would that work? In this PR we're essentially deferring the whole implementation to Altair - if you have time/interest, do you fancy opening a separate PR to show how it would work?

@MarcoGorelli I'm starting to think I've bitten off more than I can chew with this one 😞

I guess I'll run through some stuff, maybe it sparks an idea for someone else.


importlib.metadata.entry_points seems to be the route to achieve multiple backends.
pandas is using the stdlib equivalent of what was suggested in pandas-dev/pandas#27488 (comment) (thanks @mattijn) - that was a third-party library at the time (2019).

Entrypoints could be utilized by polars to introduce a method to support various plotting backends.
Altair supports this functionality. But honestly, I don't know the details. See also altair-viz/altair_pandas.
@mattijn

We've got lots of examples of this in altair derived from https://github.com/vega/altair/blob/6c4c7856a5b134103d3db1205035d08a83fc3aa6/altair/utils/plugin_registry.py

However I don't think this is sufficient for the task, given that each backend would be returning vastly different objects.
It would be a pretty bad UX returning an imprecise type - breaking any autocomplete, etc.

Probably a fair assumption that a user would be calling pl.DataFrame.plot in an interactive environment, like hvPlot seems to rely on prioritizing an IPython environment.
Personally think that shouldn't come at the cost of experience in an IDE (e.g. vega/altair#3466) - but it is an option.


Digging through ibis I found some examples of combining entry_points and runtime typing.
Pretty atypical use of the typing system, was interesting to read through though:

AFAIK this would still rely on lots of library-specific code and some IR, which I was hoping to avoid.


Something I hadn't seen, but thought could be explored is using library-specific stubs.

Did some experimenting with altair and seaborn (since there are stubs https://github.com/python/typeshed/tree/main/stubs/seaborn).
Maybe there is something to this?

Code block
# hypothetical `.pyi`, located external to `polars`

# ruff: noqa: F401
import sys
import typing as t
import typing_extensions as te
from typing import Any, Generic, TypeVar

import narwhals.stable.v1 as nw
import polars as pl
import seaborn as sns
from matplotlib.axes import Axes

import altair as alt

if sys.version_info >= (3, 12):
    from typing import Protocol, runtime_checkable
else:
    from typing_extensions import Protocol, runtime_checkable

if t.TYPE_CHECKING:
    import matplotlib as mpl
    import seaborn.categorical as sns_c
    from matplotlib.axes import Axes

    ChannelType: te.TypeAlias = str | t.Mapping[str, Any] | Any

    class EncodeKwds(te.TypedDict, total=False):
        angle: ChannelType
        color: ChannelType
        column: ChannelType
        description: ChannelType
        detail: ChannelType | list[Any]
        facet: ChannelType
        fill: ChannelType
        fillOpacity: ChannelType
        href: ChannelType
        key: ChannelType
        latitude: ChannelType
        latitude2: ChannelType
        longitude: ChannelType
        longitude2: ChannelType
        opacity: ChannelType
        order: ChannelType | list[Any]
        radius: ChannelType
        radius2: ChannelType
        row: ChannelType
        shape: ChannelType
        size: ChannelType
        stroke: ChannelType
        strokeDash: ChannelType
        strokeOpacity: ChannelType
        strokeWidth: ChannelType
        text: ChannelType
        theta: ChannelType
        theta2: ChannelType
        tooltip: ChannelType | list[Any]
        url: ChannelType
        x: ChannelType
        x2: ChannelType
        xError: ChannelType
        xError2: ChannelType
        xOffset: ChannelType
        y: ChannelType
        y2: ChannelType
        yError: ChannelType
        yError2: ChannelType
        yOffset: ChannelType

T = TypeVar("T")

@runtime_checkable
class SupportsPlot(Generic[T], Protocol):
    backend: t.ClassVar[te.LiteralString]
    chart: T

    def __init__(self, df: nw.DataFrame, /) -> None: ...
    def area(self, *args: Any, **kwargs: Any) -> T: ...
    def bar(self, *args: Any, **kwargs: Any) -> T: ...
    def line(self, *args: Any, **kwargs: Any) -> T: ...
    def scatter(self, *args: Any, **kwargs: Any) -> T: ...

@runtime_checkable
class AltairPlot(SupportsPlot[alt.ChartType]):
    backend: t.ClassVar[te.LiteralString] = "altair"
    chart: T

    def __init__(self, df: nw.DataFrame, /) -> None: ...
    def area(
        self,
        x: ChannelType | None = None,
        y: ChannelType | None = None,
        color: ChannelType | None = None,
        tooltip: ChannelType | list[Any] | None = None,
        /,
        **kwargs: te.Unpack[EncodeKwds],
    ) -> alt.ChartType: ...
    def bar(
        self,
        x: ChannelType | None = None,
        y: ChannelType | None = None,
        color: ChannelType | None = None,
        tooltip: ChannelType | list[Any] | None = None,
        /,
        **kwargs: te.Unpack[EncodeKwds],
    ) -> alt.ChartType: ...
    def line(
        self,
        x: ChannelType | None = None,
        y: ChannelType | None = None,
        color: ChannelType | None = None,
        order: ChannelType | list[Any] | None = None,
        tooltip: ChannelType | list[Any] | None = None,
        /,
        **kwargs: te.Unpack[EncodeKwds],
    ) -> alt.ChartType: ...
    def scatter(
        self,
        x: ChannelType | None = None,
        y: ChannelType | None = None,
        color: ChannelType | None = None,
        size: ChannelType | None = None,
        tooltip: ChannelType | list[Any] | None = None,
        /,
        **kwargs: te.Unpack[EncodeKwds],
    ) -> alt.ChartType: ...

@runtime_checkable
class SeabornPlot(SupportsPlot[Axes]):
    backend: t.ClassVar[te.LiteralString] = "seaborn"
    chart: T

    def __init__(self, df: nw.DataFrame, /) -> None: ...
    def area(
        self,
        *,
        x: sns_c.ColumnName | sns_c._Vector | None = None,
        y: sns_c.ColumnName | sns_c._Vector | None = None,
        hue: sns_c.ColumnName | sns_c._Vector | None = None,
        **kwargs: Any,
    ) -> Axes: ...
    def bar(
        self,
        *,
        x: sns_c.ColumnName | sns_c._Vector | None = None,
        y: sns_c.ColumnName | sns_c._Vector | None = None,
        hue: sns_c.ColumnName | sns_c._Vector | None = None,
        **kwargs: Any,
    ) -> Axes: ...
    def line(
        self,
        *,
        x: sns_c.ColumnName | sns_c._Vector | None = None,
        y: sns_c.ColumnName | sns_c._Vector | None = None,
        hue: sns_c.ColumnName | sns_c._Vector | None = None,
        **kwargs: Any,
    ) -> Axes: ...
    def scatter(
        self,
        *,
        x: sns_c.ColumnName | sns_c._Vector | None = None,
        y: sns_c.ColumnName | sns_c._Vector | None = None,
        hue: sns_c.ColumnName | sns_c._Vector | None = None,
        **kwargs: Any,
    ) -> Axes: ...

Not sure how you'd convince a type checker of which SupportsPlot.backend to use, if this came from pl.Config (and not from the user directly)?


Final idea is to call in @max-muoto 👋 for thoughts on the soundness of any of the above.
Having seen you on other polars issues and in typeshed, maybe you have a fresh take?

@MarcoGorelli
Copy link
Collaborator Author

Thanks all for comments 🙏! I do like how issues such as this one bring different projects together

💯 Totally agree on not adding code to Altair which isn't directly useful to Altair itself. The only request I'd have is public types as mentioned in #17995 (comment), but even then, it's hardly essential

Regarding customisability of results 🔧 : I'd say that if anyone wants fully customisable results, they should use Altair (or their favourite plotting lib) directly. The advantage of DataFrame.plot being a really thin layer is that moving between the two should be easy, e.g.:

  • user wants to quickly visualise their data, and they call df.plot.line(x='date', y='price', color='symbol')
  • they realise they need further customisation, or turn off interactivity, or whatever else, and so they swap out df.plot.line with alt.Chart(df).mark_line().encode and go from there
  • the fact that df.plot.foo always just maps to alt.Chart(df).mark_foo().encode would make the transition predictable and free from surprises, whilst making the most common interactive case easy to find

Furthermore, having some built-in df.plot method signals to users that the default plotting backend is known to work well with Polars 🤝

I think the fully-customisable backends part is becoming too complex too quickly. No other plotting library is close to (as far as I can tell) supporting Polars natively without extra heavy dependencies. I'd suggest to:

  1. start with Altair
  2. keep plotting marked as unstable
  3. if/when other plotting libraries like Seaborn / PlotNine / etc reach this level, we discuss of a more pluggable solution or some "dataframe plotting standard" - but not now, it feels too soon

@dangotbanned
Copy link
Contributor

I think the fully-customisable backends part is becoming too complex too quickly.

Absolutely agree @MarcoGorelli, will do my best to support you with this in altair 😄

@dangotbanned
Copy link
Contributor

Regarding customisability of results 🔧 : I'd say that if anyone wants fully customisable results, they should use Altair (or their favourite plotting lib) directly. The advantage of DataFrame.plot being a really thin layer is that moving between the two should be easy

You could provide a link to https://altair-viz.github.io/user_guide/customization.html#chart-themes in the docs, for users who simply want different (but consistent) defaults

@binste
Copy link

binste commented Aug 3, 2024

Very interesting reading through all the comments and links 😄 +1 on Marco's summary: expose some types publicly in Altair, wait with standardisation until other plotting libraries are being considered as well. I'll work on the public types soon.

Regarding customisability of results 🔧 : I'd say that if anyone wants fully customisable results, they should use Altair (or their favourite plotting lib) directly. The advantage of DataFrame.plot being a really thin layer is that moving between the two should be easy

You could provide a link to https://altair-viz.github.io/user_guide/customization.html#chart-themes in the docs, for users who simply want different (but consistent) defaults

I think this is something useful to consider early on! The default theme of Altair/Vega-Lite feels a bit dated for my taste but changing it in Altair should be well thought through and be part of a major release. In Polars, we'd have the opportunity to spruce it up a bit from the beginning. Personally, I use something close to https://gist.github.com/binste/b4042fa76a89d72d45cbbb9355ec6906 which only requires minimal modifications. Streamlit have their own theme as well enabled by default

@joelostblom
Copy link
Contributor

joelostblom commented Aug 3, 2024

Cool to see this being implemented in Polars and an interesting discussion to follow! I would be inclined to agree with what @MarcoGorelli said regarding a fully-customisable backends becoming too complex too quickly and think it is a good idea to outsource any type of customization as much as possible.

In addition to switching from df.plot... to alt.Chart..., also note the configure_* methods which can be used on any Altair chart. So users could do something like df.plot.line(x='date', y='price', color='symbol').configure_axis(grid=False) to turn off gridlines. I thought of leveraging these in altair_ally, but one of the issues is that you can't set everything you would like to set (e.g. I don't think it is possible to set the actual axis title via configure_, then you have to use .title() on e.g. alt.X, alt.Y, etc), so it might still be better to just point to alt.Chart() for all configuration needs to keep it simple.

@binste
Copy link

binste commented Aug 11, 2024

FYI, Altair 5.4.0 is out now including the removal of the dependencies on numpy, pandas, and toolz + with a new altair.typing module 🥳

@MarcoGorelli MarcoGorelli marked this pull request as draft August 18, 2024 12:45
Comment on lines -7 to -9
# Calling `plot` the first time is slow
# https://github.com/pola-rs/polars/issues/13500
pytestmark = pytest.mark.slow
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importing Altair is about 70 times faster* than importing hvplot, so I think we can remove this slow marker

*timed by performing time python -c 'import altair' and time python -c 'import hvplot' 7 times each, and finding the ratio of the smallest "real time" results for both

@MarcoGorelli MarcoGorelli marked this pull request as ready for review August 19, 2024 08:43
Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @MarcoGorelli and @altair team. Really great effort and great addition. 🙌

@ritchie46 ritchie46 merged commit 0f1edda into pola-rs:main Aug 27, 2024
19 checks passed
@v1gnesh
Copy link

v1gnesh commented Aug 29, 2024

One advantage of hvplot / bokeh is the free interactivity.
Altair does interactive too but I think only in a subset.
Please consider having a config for interactivity.

@MarcoGorelli
Copy link
Collaborator Author

thanks @v1gnesh - coming soon 😉 vega/altair#3394

In the meantime, if you'd like to keep using hvplot, you can just add import hvplot at the top of your script/notebook and change .plot to .hvplot

@dwootton
Copy link

@v1gnesh what type of interactivity is important to you? For example tooltips, panning/zooming, brushes, etc

@v1gnesh
Copy link

v1gnesh commented Aug 29, 2024

@dwootton The collection of functions in the side, ex: zoom, reset zoom, selection tool, hand tool.

@joelostblom
Copy link
Contributor

@v1gnesh Just a heads up that you can already achieve zoom, reset zoom, and panning ("hand tool" in hvplot) in altair via the .interactive() method, see e.g. https://altair-viz.github.io/gallery/scatter_tooltips.html (they work without being selected in a side bar). The addition of box zoom ("selection tool" in hvplot) is being tracked in this issue vega/vega-lite#4742

When you say "free interactivity", do you mean that you would like this to be the default behavior without having to type .interactive() yourself?

@v1gnesh
Copy link

v1gnesh commented Aug 30, 2024

@joelostblom Thanks for the links.

Yup, whenever it makes sense for it to be the default, at least I would prefer interactive becoming the default.
It may not be acceptable for everyone, so whenever .interactive() is as mature as you'd like, bringing this up a poll/discussion in altair's repo will help understand what users want.

@mjmdavis
Copy link

The new default plots from Altair have reduced the interactivity of plots. I've gone back to using hvplot because I struggled to get useful images out of Altair.

Was this a premature move considering Altair is missing essential features:

  • zoom to selection
  • resize plot
  • hover to show datapoint

Was the main intent to deliver a basic plotting experience without adding many dependencies?

@MarcoGorelli
Copy link
Collaborator Author

that's totally fair - it's easy to go back to hvplot if that works for you

  • import hvplot.polars
  • use df.hvplot instead of df.plot

having said that - stay tuned, more developments may be on their way 👀 😉

@joelostblom
Copy link
Contributor

@mjmdavis Thanks for the feedback! While box zoom is not available (vega/vega-lite#4742), you should be able to hover a data point to show additional info in a tooltip as per #18625. If that's not what you mean, could you elaborate on what you expect to happens when hovering?

I'm also curious exactly what you are referring to with "resize plot", do you mean something like dragging in the corner of the plot to resize it? You are currently able to resize plots with e..g .properties(width=400) as per the polars documentation.

@mjmdavis
Copy link

So, my use case is mostly data exploration in jupyter notebooks. For this, I've become quite fond of ipympl.

My biggest probelm there is that it can be tricky to get it to work with different kernels and there's frequently a 10 minute dance to get things working.

The default plot however has the basic zoom to selection functionality that is very useful when dealing with complex signals. And it's convenient to not have to re-run code to change the size of the plot as you resize your screen.

Vega-Lite and hvplot definitely benefit from being able to show plots in saved notebooks!

There are a lot of considerations here so it's hard to please everyone. My 2c is that it's nice to be able to do some quick GUI based exploration when plotting with default settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking python Change that breaks backwards compatibility for the Python package enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants