Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for pola.rs DataFrames #2868

Closed
3 tasks
akdienes opened this issue Feb 3, 2023 · 18 comments
Closed
3 tasks

Support for pola.rs DataFrames #2868

akdienes opened this issue Feb 3, 2023 · 18 comments

Comments

@akdienes
Copy link

akdienes commented Feb 3, 2023

Right now it looks like only Pandas dataframes are supported. It would be nice to be able to use Polars as well.

Please follow these steps to make it more efficient to respond to your feature request.

  • Since Altair is a Python wrapper around the Vega-Lite visualization grammar, most feature requests should be reported directly to Vega-Lite. You can click the Action Button of your Altair chart and "Open in Vega Editor" to create a reproducible Vega-Lite example.
  • Search for duplicate issues.
  • Describe the feature's goal, motivating use cases, and its expected behavior.
@binste
Copy link
Contributor

binste commented Feb 3, 2023

I haven't used pola.rs before but it seems that it can convert quite easily a dataframe to a Pandas dataframe with the to_pandas method. Does this solve the issue for you or is there a use case which is not covered by this?

On the Altair side, there is quite some code necessary to get the data type handling (and other things) right with Pandas Dataframes. Extending this to polars and similar libraries might not be trivial.

@akdienes
Copy link
Author

akdienes commented Feb 3, 2023

I see. there is some small overhead associated with converting to pandas first, but if it will be difficult to implement on the Altair side to support the type then it is really not a big deal.

@joelostblom
Copy link
Contributor

I'm a bit hesitant to this since I believe that would require adding pola.rs as a dependency to altair. As there are quite a few alternative dataframe libraries it would be hard to draw a line of which to support and why and we might end up with a lot of different dataframe dependencies in Altair. Relying on pandas as a common denomiator for this libraries is convenient as its popularity means that many dataframe libraries have a way to convert to pandas easily. Having that said, I do also understand that it is a small inconvenient to convert to pandas each time, do you know how other python viz libraries handle this?

@ChristopherDavisUCI
Copy link
Contributor

I kind of like the idea of providing basic support for Polars DataFrames (probably without any automatic type inference) if there is a way to do that without any negative side effects (like requiring more dependencies).

@joelostblom Do you think it might be hypothetically possible to put another clause at the end of the _prepare_data definition https://github.com/altair-viz/altair/blob/39f65731a5aa52bf929be35d6ee55dab2f66e939/altair/vegalite/v5/api.py#L72 that handles Polars DataFrames? Could we get around the dependency issue you mentioned by using something like except ImportError? I don't know if that's plausible or good style or not.

@mattijn
Copy link
Contributor

mattijn commented Feb 5, 2023

I'm also in favor of direct support and I think it is possible without additional hard dependencies.

Probably not anytime soon, but maybe reworking this serialization can eventually also make pandas as a soft dependency.

@joelostblom
Copy link
Contributor

Good points, as long as we don't add dependencies or a lot of overhead I don't have an issue with it. Maybe the easiest way would be to check if the polars library exists, then check if the data is a polars dataframe, and if so convert it to a pandas dataframe? This would need to come before the pandas logic on line 89, so that we can rely on that after.

I think a try/except could work like you mentioned @ChristopherDavisUCI , or something like this (as per this SO answer) (not tested):

from importlib.util import find_spec


if find_spec('polars'):
    import polars as pl
    if isinstance(data, pl.DataFrame):
        data = data.to_pandas()

@mattijn
Copy link
Contributor

mattijn commented Feb 5, 2023

Eventually you'll need json, so maybe there is no need to go to a pandas dataframe first. Eg. for geo-data I check if the instance has a __geo_interface__, so any package that support this protocol is supported in Altair. Maybe there also exist a dataframe-protocol?

@ChristopherDavisUCI
Copy link
Contributor

I'm happy to spend some time this week thinking about this. Polars looks very cool (and I haven't used it before).

@joelostblom
Copy link
Contributor

One advantage of converting to pandas is that we could rely on our already existing type inference without having to introduce new logic for polars type inference (instead it would be up to the polars library to correctly convert their types to the corresponding pandas type which I would prefer).

https://github.com/altair-viz/altair/blob/97ff1ebc3a485be1c00a34e5d19801eca5287e6f/altair/utils/core.py#L184-L223

@ChristopherDavisUCI
Copy link
Contributor

ChristopherDavisUCI commented Feb 5, 2023

The only reason I'm a little hesitant about type inference for Polars via pandas, @joelostblom, is I was wondering if in the future we would maybe want to treat Polars DataFrames differently (for example, if there was a way to benefit from their advantages over pandas DataFrames). I wouldn't want Polars users to get used to not having to provide a data type, and then have that suddenly break in the future. So my first thought was to provide less support initially (it seems easier to add more support later).

But I haven't thought about it too closely...

(Honestly as you mentioned once, I think there could be a strong argument for always making the user specify the data type, but I think that ship has probably already sailed.)

@joelostblom
Copy link
Contributor

Yeah that is a good point too. I guess it requires some investigation in which of these features is more important. I think this is low priority for now personally, especially if it will be more laborsome than something simple like the above, but I am not against having specific polars support.

@mattijn
Copy link
Contributor

mattijn commented Feb 5, 2023

I'm also in favor of starting providing limited support for other-a-like dataframes and distill existing pandas serialization logic to become more dataframe-agnostic.

@NOD507
Copy link

NOD507 commented Feb 11, 2023

Related to this, since vega and vega-lite already support arrow format it would be nice to have a chart reading directly from an arrow file. https://github.com/vega/vega-loader-arrow
https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.DataFrame.write_ipc.html

@mattijn
Copy link
Contributor

mattijn commented Feb 12, 2023

The vega-loader-arrow package is an extension for vega and not native included within vega-lite and currently not yet loaded by altair. Direct arrow support also would require to include an arrow js library. See also this related #2471 and this similar PR vega/ipyvega#346 that just got merged in ipyvega.

@ChristopherDavisUCI
Copy link
Contributor

@mattijn @joelostblom This thread fell off my radar, but I should have a chance to look at it this week!

@ChristopherDavisUCI
Copy link
Contributor

I wrote a very short draft of a possible PR for supporting Polars DataFrames in #2888

As I wrote there and above, I didn't do any type inference at this stage (for example, by converting to a pandas DataFrame), because I figured it would be easier to add that support later rather than possibly taking it away.

I've never used Polars DataFrames before (nor really any method of specifying data in Altair aside form pandas or URLs), so please let me know if this is working for you!

@akdienes
Copy link
Author

I will try it out! Thanks

@joelostblom
Copy link
Contributor

Merged in #2888

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Ecosystem integration
Development

No branches or pull requests

6 participants