sync w/ internal repo; update models #1083

leewyang · 2024-06-06T18:01:43Z

This PR syncs the migrated QualX code with the latest code from the internal repo.

Changes

Fix failed stages handling.
Ignore wholestagecodegen for unsupported.
Use actual gpu app duration.
Default to dMAPE metric.
Add updated model json files.

Test

Following CMDs have been tested:

Internal Usage:

python qualx_main.py preprocess
python qualx_main.py train
python qualx_main.py evaluate

Signed-off-by: Lee Yang <[email protected]>

parthosa

Thanks @leewyang. LGTM. Welcome to tools repo 🎉

Once things are stable, It might be helpful to work on a separate feature branch to skip GitHub workflows related to maven.

cc: @amahussein

amahussein

Thanks @leewyang
A couple of issues I raised in my comment.

amahussein · 2024-06-06T18:37:38Z

user_tools/src/spark_rapids_tools/tools/qualx/preprocess.py

+        if not toc_list:
+            raise ValueError(f'No CSV files found for: {ds_name}')
+        else:


Lets set the standard:

raising an error: it means something is really bad. In that case do we fallback to legacy Speedup?

Return a default DF with speedup 1.0: this is the case when the QualX says that there is no speedup because there are ceratin features, files/columns I cannot read/empty. Then the speedup is 1.0

For the case when it is "not toc_list", do we want to raise an error? or return 1.0 after logging that files were missing?
CC: @eordentlich and @parthosa

@amahussein this should be a very rare case. Basically, the user has requested to load CSV files for a dataset, and we have found none at all for the entire dataset. Previously, we would just skip this dataset (failing silently), so added this code to raise it to the user's attention. Presumably, the legacy Speedup would also fail in this case too (or would it just return 1.0?).

amahussein · 2024-06-06T18:42:18Z

user_tools/src/spark_rapids_tools/tools/qualx/preprocess.py

+        node_level_supp['Exec Is Supported'] = (
+            node_level_supp['Exec Is Supported']
+            | node_level_supp['Exec Name'].apply(
+                lambda x: any([x.startswith(nm) for nm in unsupported_overrides])
+            )
+            | node_level_supp['Exec Name'].apply(
+                lambda x: x.startswith('WholeStageCodegen')
+            )


I thought that change was rolled back..Just double checking.

Per @eordentlich's commit to internal repo, this is a workaround for #860.

amahussein

Let's merge that to unblock then followup later on whether to raise-error or not in subsequent changes.

sync w/ internal repo; update models

dcea5d7

Signed-off-by: Lee Yang <[email protected]>

leewyang requested a review from parthosa June 6, 2024 18:01

parthosa requested a review from amahussein June 6, 2024 18:15

parthosa assigned leewyang Jun 6, 2024

parthosa added the user_tools Scope the wrapper module running CSP, QualX, and reports (python) label Jun 6, 2024

parthosa approved these changes Jun 6, 2024

View reviewed changes

amahussein reviewed Jun 6, 2024

View reviewed changes

amahussein approved these changes Jun 6, 2024

View reviewed changes

parthosa merged commit 5af161f into NVIDIA:dev Jun 6, 2024
16 checks passed

leewyang deleted the qualx_sync branch June 6, 2024 19:19

leewyang mentioned this pull request Jun 7, 2024

fix signature error from overlapping merges #1084

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync w/ internal repo; update models #1083

sync w/ internal repo; update models #1083

leewyang commented Jun 6, 2024 •

edited

Loading

parthosa left a comment •

edited

Loading

amahussein left a comment

amahussein Jun 6, 2024

leewyang Jun 6, 2024

amahussein Jun 6, 2024

leewyang Jun 6, 2024 •

edited

Loading

amahussein left a comment

sync w/ internal repo; update models #1083

sync w/ internal repo; update models #1083

Conversation

leewyang commented Jun 6, 2024 • edited Loading

Changes

Test

Internal Usage:

parthosa left a comment • edited Loading

Choose a reason for hiding this comment

amahussein left a comment

Choose a reason for hiding this comment

amahussein Jun 6, 2024

Choose a reason for hiding this comment

leewyang Jun 6, 2024

Choose a reason for hiding this comment

amahussein Jun 6, 2024

Choose a reason for hiding this comment

leewyang Jun 6, 2024 • edited Loading

Choose a reason for hiding this comment

amahussein left a comment

Choose a reason for hiding this comment

leewyang commented Jun 6, 2024 •

edited

Loading

parthosa left a comment •

edited

Loading

leewyang Jun 6, 2024 •

edited

Loading