-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track memory usage for each individual operator #899
Comments
This got me interesting, so I started looking into it, and I'm not sure how we aim to tackle it. My 1st idea was to write a Decorator which implements The other approach I found (from the article below) is implementing something similar to Servo's Not sure where to go from here, would love to hear some feedback. Some references: https://rust-analyzer.github.io/blog/2020/12/04/measuring-memory-usage-in-rust.html |
I was kind of imagining we would have to do something like manually registering memory allocations. the While it would be likely be crazy complicated to do this for all allocations, I think all the built in DataFusion operators use most of their memory in intermediate RecordBatches and a potential single large structure (e.g. the hash tables in hash_join and hash_aggregate) If we captured these large sources I think that would get us most of the value |
Cool, so I dug through the code a bit, and this seems to be a bit out of my league (needs high familiarity with way too many things). Thank you for the response! |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When reviewing a plan, it would be nice to know the amount of memory each individual
ExecutionPlan
allocated during its execution.Describe the solution you'd like
Add two new metrics to all operators:
"Allocated" should include both memory in created record batches as well as any internal memory (as described in #898 -- hopefully this code would just use the same underlying allocation measurement)
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Probably could follow the same model as #866 (baseline metrics for all operators) once that is implemented
#898 is for tracking overall memory allocations across all operators in a plan. This issue is for tracking the allocations for each individual operator
The text was updated successfully, but these errors were encountered: