-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Replace pandas dependence/use with narwhals #7462
Comments
Last time I checked we don't really depend on pandas, it's arviz that does |
Does that mean that we can use polars dataframes when using PyMC? |
As data? Not sure, there's some special logic for handling pandas. But PyMC does not depend on pandas, so maybe you are requesting a new feature, not a change of dependency |
Btw pandas special logic is:
I don't think we can replace that by narwhals since dispatch works on types at runtime. We would need to dispatch on polars as well.
2.1 It may actually already work because IIRC it's all based on duck typing |
Yeah, was taking a peek at this today -- if we were using pandas-specific functionality (merging, etc.) then it would make sense to use narwhals. For the post part we are taking DataFrames (and Series) and turning them into ndarrays. The only exception may be in deriving dims from indexes, since polars does not use indexes. |
Hey, thanks for looking into Narwhals 🙏 First, thanks for opening #7463, it's great to see Polars support come along - facilitating that was one of my goals with Narwhals, and if it can happen even without it, even better 💪 I think #7463 is already a net-positive, I just wanted to leave some comments, in case they're of interest:
No hard feeling of course if you keep the current approach, I was wanted to point out the possibilities ♾️ All the best, and it was really fun meeting some of you at PyData London! EDIT: upon further inspection, I was wrong about the pytensor part, Narwhals wouldn't help there (unless you used it in PyTensor too), I think it would only potentially help in |
Before
No response
After
No response
Context for the issue:
With the rise in popularity of packages such as polars, arrow, and others PyMC's dependence on pandas is looking less and less universally useful. Narwhals was developed for developers of python libraries that consumes dataframes who wish wishing to make their libraries completely dataframe-agnostic. So maybe we consider using that instead of pandas per se? Narwhals has no dependencies and has negligible overhead, so it seems relatively lightweight.
It will require some refactoring as it relies on (as subset of) the polars API.
The text was updated successfully, but these errors were encountered: