-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for pola.rs DataFrames #2868
Comments
I haven't used pola.rs before but it seems that it can convert quite easily a dataframe to a Pandas dataframe with the to_pandas method. Does this solve the issue for you or is there a use case which is not covered by this? On the Altair side, there is quite some code necessary to get the data type handling (and other things) right with Pandas Dataframes. Extending this to polars and similar libraries might not be trivial. |
I see. there is some small overhead associated with converting to pandas first, but if it will be difficult to implement on the Altair side to support the type then it is really not a big deal. |
I'm a bit hesitant to this since I believe that would require adding pola.rs as a dependency to altair. As there are quite a few alternative dataframe libraries it would be hard to draw a line of which to support and why and we might end up with a lot of different dataframe dependencies in Altair. Relying on pandas as a common denomiator for this libraries is convenient as its popularity means that many dataframe libraries have a way to convert to pandas easily. Having that said, I do also understand that it is a small inconvenient to convert to pandas each time, do you know how other python viz libraries handle this? |
I kind of like the idea of providing basic support for Polars DataFrames (probably without any automatic type inference) if there is a way to do that without any negative side effects (like requiring more dependencies). @joelostblom Do you think it might be hypothetically possible to put another clause at the end of the |
I'm also in favor of direct support and I think it is possible without additional hard dependencies. Probably not anytime soon, but maybe reworking this serialization can eventually also make pandas as a soft dependency. |
Good points, as long as we don't add dependencies or a lot of overhead I don't have an issue with it. Maybe the easiest way would be to check if the polars library exists, then check if the data is a polars dataframe, and if so convert it to a pandas dataframe? This would need to come before the pandas logic on line 89, so that we can rely on that after. I think a try/except could work like you mentioned @ChristopherDavisUCI , or something like this (as per this SO answer) (not tested): from importlib.util import find_spec
if find_spec('polars'):
import polars as pl
if isinstance(data, pl.DataFrame):
data = data.to_pandas() |
Eventually you'll need json, so maybe there is no need to go to a pandas dataframe first. Eg. for geo-data I check if the instance has a |
I'm happy to spend some time this week thinking about this. Polars looks very cool (and I haven't used it before). |
One advantage of converting to pandas is that we could rely on our already existing type inference without having to introduce new logic for polars type inference (instead it would be up to the polars library to correctly convert their types to the corresponding pandas type which I would prefer). |
The only reason I'm a little hesitant about type inference for Polars via pandas, @joelostblom, is I was wondering if in the future we would maybe want to treat Polars DataFrames differently (for example, if there was a way to benefit from their advantages over pandas DataFrames). I wouldn't want Polars users to get used to not having to provide a data type, and then have that suddenly break in the future. So my first thought was to provide less support initially (it seems easier to add more support later). But I haven't thought about it too closely... (Honestly as you mentioned once, I think there could be a strong argument for always making the user specify the data type, but I think that ship has probably already sailed.) |
Yeah that is a good point too. I guess it requires some investigation in which of these features is more important. I think this is low priority for now personally, especially if it will be more laborsome than something simple like the above, but I am not against having specific polars support. |
I'm also in favor of starting providing limited support for other-a-like dataframes and distill existing pandas serialization logic to become more dataframe-agnostic. |
Related to this, since vega and vega-lite already support arrow format it would be nice to have a chart reading directly from an arrow file. https://github.com/vega/vega-loader-arrow |
The vega-loader-arrow package is an extension for vega and not native included within vega-lite and currently not yet loaded by altair. Direct arrow support also would require to include an arrow js library. See also this related #2471 and this similar PR vega/ipyvega#346 that just got merged in ipyvega. |
@mattijn @joelostblom This thread fell off my radar, but I should have a chance to look at it this week! |
I wrote a very short draft of a possible PR for supporting Polars DataFrames in #2888 As I wrote there and above, I didn't do any type inference at this stage (for example, by converting to a pandas DataFrame), because I figured it would be easier to add that support later rather than possibly taking it away. I've never used Polars DataFrames before (nor really any method of specifying data in Altair aside form pandas or URLs), so please let me know if this is working for you! |
I will try it out! Thanks |
Merged in #2888 |
Right now it looks like only Pandas dataframes are supported. It would be nice to be able to use Polars as well.
Please follow these steps to make it more efficient to respond to your feature request.
The text was updated successfully, but these errors were encountered: