-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive memory usage of Delta writers #1225
Comments
Very interesting! Thanks for the detailed bug report @gruuya |
I'd definitely be in favor of writing incrementally with |
Once the optimistic commit PR has been merged, I planned to focus some attention to the write paths as well. There are some changes on the way or already integrated with datafusion around how writes are handled. Essentially so far the was no coincept of a sink within the logical / physical plans, and that either changed, or is about to . My hope was to integrate with that... The above analysis is great to help what to look out for... |
Fwiw, I've revised the existing Seafowl parquet writing logic (which was already optimized) to perform Delta-writes instead, so here's a relevant data point to this issue. Trying it out on the above example (which is actually just a step in our tutorial) shows a 10x decrease in peak memory consumption (from above 1GB to always below 100MB): To re-cap, the optimizations that permit this are
In fact, the main logic found in |
Certainly! Will do some profiling and post the result here later on. |
Unfortunately, that didn't make much difference: To streamline the memory profiling operation I've created a new repository that allows reproducing the profiling in three lines:
|
@roeap @wjones127 if it doesn't interfere with your plans I could try and re-purpose Seafowl's writing logic and open a This would basically mean re-factoring delta-rs |
I think we'll have new optimization soon once some upstream changes are made in object-store and parquet creates. In object store, the next release will add support for multi-part uploads for Cloudflare R2, which is the only thing blocking us from switching over. In parquet, the next release will have a method for getting the size of the currently buffered row group, which will let us control the size of row groups (and in turn the memory overhead). That also won't require any temporary files.
That's not unexpected. The changes should make it faster, not necessarily use less total memory. We'll need to do the changes described above to use less memory. Are there optimizations beyond the ones described you think you can add? |
Oh I didn't realize that, thanks!
Interesting, I'm quite curious how that would work; does this mean:
Apart from the above point not really. |
You can flush the row group to storage but the writer keeps the metadata in memory. At the end, the metadata is written when closing the file. So parquet files can be written out in a streaming fashion like this. They just can’t be appended to once they are closed. |
Nice, looking forward to that! |
Environment
Delta-rs version: 0.8.0
Binding: Rust
Environment:
Bug
What happened:
Writing to Delta tables from Datafusion scans of relatively modest Parquet files (~150MB) results in memory usage explosion (peak >1GB).
What you expected to happen:
The memory usage shouldn't peak so drastically.
How to reproduce it:
https://seafowl-public.s3.eu-west-1.amazonaws.com/tutorial/trase-supply-chains.parquet
and store it somewhere:$ du -h ~/supply-chains.parquet 146M /home/ubuntu/supply-chains.parquet
Cargo.toml
:~/bytehound/target/release/bytehound server -p 8888 memory-profiling_*.dat
The results I get:
If I group by backtrace I get the two biggest offenders:
(I suspect the first one could be misleading)
More details:
This is something we've previously had to deal with in Seafowl:
plan_to_object_store
splitgraph/seafowl#71I suspect the biggest contributor to this is the fact that pending object store uploads are being buffered to memory, while the optimal solution is to first persist record batch outputs of a Datafusion pla to temporary Parquet files instead. In addition, these files should then be uploaded by loading them chunk by chunk and using
ObjectStore::put_multipart
on each chunk so as to avoid loading the entire file to memory.The text was updated successfully, but these errors were encountered: