[FEA] Add Estimation Model to Qualification CLI #870

amahussein · 2024-03-22T18:18:29Z

Signed-off-by: Ahmed Hussein (amahussein) [email protected]

Fixes #869

Add estimation_model to qualification arguments
Refactor the job sumbission to run as concurrent processes as we need to run both qualification and profiling tool in the xgboost model
Remove --per-sql from the allowed list of qualification tool because it is used in the XGBOOST model
Import qualx code into user_tools repo
Running qual CLI with --estimation_model XGBOOST runs the prediction model and generate the results as intermediate output, but it won't affect the final results
There are 2 files generated inside the final output directory qual_*_app.csv and qual_*_sql

Notes:

The estimation model uses on-prem for now
The remaining work will be WIP through tasks listed in issue-806
- we want to extract the readDataFormat from the profiler output of each application
- This probably will be a new class to hold appMetadata
- Modify the modeling code to process one app at a time. Depending on the metadata, the prediction model will be loaded.
- Prediction model uses information from both Profiler and Qual CSV files. So, we will need to handle errors that could be raised from applications that do not exist in both tools outputs.
- The new Speedups are generated. Then we override the original Speedup estimation with the Qx Prediction, “estimated_df”
- The “estimated_df” should be similar to the legacy qual DF.
- The report generation (stdout+csv file) code won’t need to be changed.

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]> Fixes NVIDIA#869 - Add `estimation_model` to qualification arguments - Refactor the job sumbission to run as concurrent processes as we need to run both qualification and profiling tool in the xgboost model - Remove `--per-sql` from the allowed list of qualification tool because it is used in the XGBOOST model

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>

mattahrens · 2024-03-25T15:55:57Z

user_tools/src/spark_rapids_pytools/wrappers/databricks_aws_wrapper.py

@@ -92,6 +93,10 @@ def qualification(cpu_cluster: str = None,
                "MATCH": keep GPU cluster same number of nodes as CPU cluster;
                "CLUSTER": recommend optimal GPU cluster by cost for entire cluster;
                "JOB": recommend optimal GPU cluster by cost per job.
+        :param estimation_model: Model used to calculate the estimated GPU duration and cost savings.
+               It accepts one of the following:
+               "XGBOOST": an XGBoost model for GPU duration estimation


minor preference to use lowercase for argument values: XGBOOST --> xgboost

Thanks @mattahrens.

Done!
The CLI handles both lower/upper-cases. I changed the comments to lower-case which reflects on the output of the --help command.

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>

user_tools/src/spark_rapids_tools/tools/model_xgboost.py

parthosa

Thanks @amahussein.

amahussein added 2 commits March 22, 2024 12:52

Import qualx code into user_tools repo

86c6128

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>

amahussein added feature request New feature or request core_tools Scope the core module (scala) labels Mar 22, 2024

amahussein self-assigned this Mar 22, 2024

amahussein requested a review from parthosa March 22, 2024 18:32

amahussein changed the title ~~[FEA] Add estimationModel to Qualification CLI~~ [FEA] Add Estimation Model to Qualification CLI Mar 22, 2024

mattahrens reviewed Mar 25, 2024

View reviewed changes

Use lower-case while listing the estimation_model accepted values

a03ab3d

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>

amahussein requested a review from mattahrens March 25, 2024 20:55

parthosa reviewed Mar 25, 2024

View reviewed changes

user_tools/src/spark_rapids_tools/tools/model_xgboost.py Show resolved Hide resolved

parthosa approved these changes Mar 25, 2024

View reviewed changes

amahussein merged commit e005165 into NVIDIA:dev Mar 25, 2024
13 checks passed

amahussein deleted the spark-rapids-tools-869 branch March 25, 2024 21:49

parthosa mentioned this pull request Jun 6, 2024

Remove using Profiler metrics for QualX and Heuristics #1080

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Add Estimation Model to Qualification CLI #870

[FEA] Add Estimation Model to Qualification CLI #870

amahussein commented Mar 22, 2024 •

edited

Loading

mattahrens Mar 25, 2024

amahussein Mar 25, 2024 •

edited

Loading

parthosa left a comment

[FEA] Add Estimation Model to Qualification CLI #870

[FEA] Add Estimation Model to Qualification CLI #870

Conversation

amahussein commented Mar 22, 2024 • edited Loading

mattahrens Mar 25, 2024

Choose a reason for hiding this comment

amahussein Mar 25, 2024 • edited Loading

Choose a reason for hiding this comment

parthosa left a comment

Choose a reason for hiding this comment

amahussein commented Mar 22, 2024 •

edited

Loading

amahussein Mar 25, 2024 •

edited

Loading