-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large Memory Spike on Merge #2802
Comments
are you able to test with 0.19.0? That release contains a number of performance and memory improvements which will also benefit merge operations |
Also if you still see issues with 0.19+, can you then use this branch and compile it: https://github.com/ion-elgreco/delta-rs/tree/debug/merge_explain And then share the output that get's spitted in the stdout, I would like to see the plan with the executed stats |
@rob-harrison can you check the memory performance with this branch: https://github.com/ion-elgreco/delta-rs/tree/fix/set_greedy_mem_pool |
@ion-elgreco
|
Thanks for the detailed analysis @rob-harrison. Do you have an idea of what the working data set in memory for the merge might be? i.e. how many rows are trying to be merged? There have been some cases I have seen where the source/target data was simply too large for a merge to happen in memory with Python/Rust and we had to drop out to Spark to do the job since it has the facilities to spread that load across machines |
@rtyler please see typical merge metrics below:
We're talking between 1k-5k max source rows.
It doesn't feel to me like we should be anywhere near the limits requiring a move to Spark. |
going over the attached plan from the above merge and reading it backwards (correct?), the following seems apparent:
If I'm reading the above correctly, the issue seems to stem from not pushing down the partition predicate to the initial delta scan. |
@rob-harrison thanks for sharing the explain output, indeed files are not being pruned. And this is due to the I will take a look at this! |
@ion-elgreco just tried changing partitions from IN list to series of OR conditions - can confirm pushdown works!
|
@rob-harrison yeah |
@rob-harrison I've pushed a PR, should land in 0.19.2 soonish |
many thanks @ion-elgreco 🙏 |
Environment
Delta-rs version: 0.18.2
Binding: Python
Environment:
Bug
What happened:
What you expected to happen:
Memory to remain within reasonable limits
How to reproduce it:
More details:
The text was updated successfully, but these errors were encountered: