-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replacing pandas with polars #1926
Comments
I ❤️ this! 🤣
I 💯 agree. Maybe change issue title to “benchmark polars v pandas”? And could broaden it to include xarray and numpy structs to be unbiased? one thing not seeing is how polars behaves with vectorization. I see SIMD but does that require or utilize MKL or specific CPU arch? Polars send great at SQL operations, but we don’t use it much in pvlib, eg sorting, selecting, etc. also, important to note where/how pvlib api changes bc of pandas extensive use |
@langestefan can you say more about what "replace pandas with polars" would mean in this context? Not supporting pandas inputs, or changing our docs to use polars instead of pandas, or what? Off the top of my head, there aren't a whole lot of places that pvlib actually uses pandas directly ( If polars support is valuable, what seems like a better end result to me is to try to support both polars and pandas rather than dropping one for the other. I'm not eager to proliferate the |
Like @kevinsa5 I'd say it's way to early to support polars instead of pandas. The API is still unstable, version 0.X. But I really don't see any improvement in using polars here, apart from using the new shiny library, an effort is being made to use numpy array as inputs in functions where pandas were used before see #1455. Even when processing 20 years of weather data for a single location it's not even close to several millions of rows. Moving forward using Numpy types instead of Dataframe or Series will make integrating with Polars or any Dataframe library easy. We can probably use polars already if that's what you want with a bunch of PVLIB API. |
I think it would be more appropriate to simply make pvlib "agnostic" to if one is using |
Narwhals would be way to go. |
Oh cool. I need to check this out further! |
Is your feature request related to a problem? Please describe.
Pandas is used extensively in pvlib all over the place. However after using it for many years I think it is now outdated and needs to be replaced. It is greedy, slow, inconsistent and the API is awkward and difficult to use.
Describe the solution you'd like
I would like to propose replacing pandas with polars. Polars is a DataFrame interface built in Rust with many, many performance benefits. Too many to list them here, so I suggest visiting their page to learn more.
I know this is a very big change, but I believe it will lead to some serious performance and usability benefits. So there is no real plan yet, I would like to just start the discussion and see what others think. I think a good starting point would be to benchmark some performance-critical pvlib code in pandas and polars so we can quantify what the performance benefits for pvlib would be.
The text was updated successfully, but these errors were encountered: