-
Notifications
You must be signed in to change notification settings - Fork 803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[parquet]: feature gate functionality in parquet. #4764
Comments
Really cool to see this happening. That being said, my experience with feature flags has been that the combinatorial explosion becomes very hard to test. It also has a tendency to result in poor test iteration times as different crates set different sets of features forcing recompilation. I have instead found splitting up into sub-crates to be a better approach to this. However, the current granularity is already pretty small, is there a particular crate that is showing up as a bottleneck during compilation? |
Gotta start somewhere. :)
This should not hurt your test times as I will ensure the default features remain the same. This will only make it possible for end-users as polars to cherry pick some stuff. For testing we use
The compile times in polars are already way longer than we want. That's why I'd like to cherry pick on functionality. For the |
The IPC readers are necessary to read the embedded arrow schema data, I'm also surprised if they are a meaningful bottleneck in compile times. Yes arrow thrift encodes a base64-encoded flatbuffer 🤯
The parquet crate doesn't make use of the take kernel, it does make use of the cast kernels though which indirectly use the cast kernel. This would be non-trivial to remove, especially for decimals
Yes until someone uses disabled default features 😅 I dunno, I have found features to not really justify their downsides.
We still need to ensure compilation with non-default sets of feature flags? We currently already have a fairly long list of combinations in CI...
Perhaps we might do this empirically? If there are kernels that represent an outsize impact on compile times we then work out mitigation strategies for them? I strongly suspect that there are couple of particularly problematic sub-kernels (likely dictionaries) that have an outsize impact, and which it might be able to address this without having to resort to feature flags? I'm more than happy to help out with this. FWIW I have some ideas on how to make the take kernel less expensive. (Edit: nvm I already implemented this as #4705 I just forgot) |
I did some
There are some significant savings with some trivial features. Indeed as you said I understand your hesitation about feature gating, but I'd argue not all are the same. Some feature gates very naturally fit the code order and can gate entire modules. It can save a lot of cumulative compile times for downstream users. :) For the planet! 😉 |
So long as we're being data driven and the returns are significant, I can probably grin and bear it |
I want to try partially moving/doubling IO to
arrow-rs
and start migration incrementally.Describe the solution you'd like
Start feature gating some functionality in
parquet
that reduces compile time.The text was updated successfully, but these errors were encountered: