-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Better Memory Management for BroadcastNestedLoopJoin #302
Comments
My preference would be approach#1 since it gives us more control. |
I think we need both. If we just do the first approach then we risk having some processing up stream that produced a large batch. In theory we can push it up further, but at some point we might lose by processing really small batches. We might be able to get away with just option 1 for some things, but eventually I think we need both. |
Hi @revans2, I have some questions about the first approach. Does it applied on the broadcast table? And what kind of function the |
Currently we only support inner joins for this. It means that each join is going to do a full cross join, optionally followed by a filter. In this case, inner joins, Spark will insert a A cross join produces rows where everything on the left is combined with everything on the right. 100 rows on the left and 5 rows on the right produces 500 rows of output. So we can guess at the output size with
In query planning we can estimate the size of each row based off of the schema. We can estimate the size of the broadcast based off of the autoBroadcastThreashold config. We can also set our target output size based off of gpuTargetBatchSizeBytes. So for the CoalesceBatch we can start out with a default value of something like.
Once we have real numbers for the build table and the batch that showed up we can then decide if we need to split up the input table or not. |
Thanks for so detailed explanation. I am trying to understand it.
|
The math I did was just quick back of the envelope math. If you can come up with a better estimate for splitting the data then that is fine. Also please check my math, I am not 100% sure that it is correct. The plan you showed is the typical plan. The build table comes from a |
Closing as won't fix for now |
…#302) * refactor overseer_agent to AdminAPI from cli, add working ha commands * Fix CI * fix CI * remove check because it was performed earlier * fix CI
Signed-off-by: spark-rapids automation <[email protected]>
For #296 BroadcastNestedLoopJoin is going to be disabled by default because of the potential for a memory explosion. Before we can enable it by default we need a much better way to do memory management for it.
Some of the ideas that I had are.
CoalesceBatches
take a function instead of a hard value and use the schema + broadcast size to avoid combining data together if we are just going to have to split it apart later anyways.There are only two join types were we could split the broadcast table and those are covered by
CarteseanExec
. In all other cases if we wanted to split the broadcast table we would need some help from cudf to support it.The text was updated successfully, but these errors were encountered: