-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the Qual tool AutoTuner Heuristics against CPU event logs #1069
Conversation
Signed-off-by: Thomas Graves <[email protected]>
Signed-off-by: Thomas Graves <[email protected]>
Signed-off-by: Thomas Graves <[email protected]>
Should we add that file to the repo? Perhaps inside inside |
Sure I can add it. I also realized I wanted to add a few more tests to the Suite so I'll do that and push some updates shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tgravescs
core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/AutoTuner.scala
Show resolved
Hide resolved
core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/AutoTuner.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you filed #1078
Thanks @tgravescs
LGTME
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tgravescs !
fixes #1068
This enhances heuristics around the spark.executor.memory and handles cases where the memory to core ratio is to small. It will throw an exception and not put out tunings if the core/memory ratio is to small. In the future we should just tag this and recommend the sizes.
This also adds in extra overhead since worst case we need space for both pinned memory and spill memory. It gets a little complicated since spill will use pinned memory, but if its used it will use regular off heap. So here we set things at worst case which is it needs both.
I also added in heuristics for configuring the multithreaded readers - num threads and some sizes and also the shuffle reader/writer thread pools based on the number of cores.
Most of the heuristics are based on what we saw from real customer workloads and NDS results.
Most of this testing was on CSPs, I will try to apply more to onprem later.
note most of this functionality needs the worker information passed in
--worker-info ./worker_info-demo-gpu-cluster.yaml
Example:
With the worker info:
Without the worker info: