-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add heuristics using stage spill metrics to skip apps #1002
Add heuristics using stage spill metrics to skip apps #1002
Conversation
Signed-off-by: Partho Sarthi <[email protected]>
user_tools/src/spark_rapids_tools/tools/additional_heuristics.py
Outdated
Show resolved
Hide resolved
does this have any output saying which execs/stages/sqlid cause this to be skipped? Did we generate any test eventlogs/queries that can be used to continue integration testing this or that could also be used by qualx to train on this scenario? It seems like this should be done in Java, is there a followup to move it there? |
There is no output associated with it. A column
I have test event logs that I used to test this scenario. This can be added in the integration testing. I will include this as part of improving E2E tools testing #970
Yes, currently we need metrics from Profiling tool for this estimate. Once the merging of Profiling/Qualification tool is done, we will migrate this to Java/Scala side. Created an issue to track this #1008 |
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
This reverts commit b774958.
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
Changes
ReasonsThere could be two potential reasons:
Output Covering both cases:File:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattahrens Are you ok with the PR as a temp work around until the heuristics are implemented in Scala module?
Yes, I'm fine with it. 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTME.
Thanks @parthosa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa! Just a minor nit.
I am also wondering in the output file: is 1000000000 bytes
or 10 GB
more clear?
Signed-off-by: Partho Sarthi <[email protected]>
Thanks @cindyyuanjiang. I think 10 GB would be more clear. Added a function to convert bytes to human readable format. We have the following reason now:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa! LGTM.
Fixes #477. This PR adds a generic
Additional Heuristics
module to skip recommending apps based on heuristics. It introduces an additional columnSkip By Heuristics
in qualification summary file. This logic will be applied only if the user tools is run with--estimation_model xgboost
since it uses profiler output.Changes:
Added a specific heuristic to skip applications based on spill metrics:
job_+_stage_level_aggregated_task_metrics.csv
Identify stages with spills greater than a thresholdsql_to_stage_information.csv
, check if above spill stages have Execs other than the ones allowed (Join, Aggregate or Sort)=> Column
Skip By Heuristics
would beTrue
for this application.Skip By Heuristics
isTrue
for the application, set the category toNot Recommended
.Things to discuss:
qualification-conf.yaml
Skip By Heuristics
will be aggregated usingany()
function.Steps to Evaluate:
"Memory Bytes Spilled":50000000'
in the eventSparkListenerTaskEnd
for certain stages in any test event log.