Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync w/ internal repo; update models #1083

Merged
merged 1 commit into from
Jun 6, 2024
Merged

Conversation

leewyang
Copy link
Collaborator

@leewyang leewyang commented Jun 6, 2024

This PR syncs the migrated QualX code with the latest code from the internal repo.

Changes

  1. Fix failed stages handling.
  2. Ignore wholestagecodegen for unsupported.
  3. Use actual gpu app duration.
  4. Default to dMAPE metric.
  5. Add updated model json files.

Test

Following CMDs have been tested:

Internal Usage:

python qualx_main.py preprocess
python qualx_main.py train
python qualx_main.py evaluate

@leewyang leewyang requested a review from parthosa June 6, 2024 18:01
@parthosa parthosa requested a review from amahussein June 6, 2024 18:15
@parthosa parthosa added the user_tools Scope the wrapper module running CSP, QualX, and reports (python) label Jun 6, 2024
Copy link
Collaborator

@parthosa parthosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @leewyang. LGTM. Welcome to tools repo 🎉

Once things are stable, It might be helpful to work on a separate feature branch to skip GitHub workflows related to maven.

cc: @amahussein

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @leewyang
A couple of issues I raised in my comment.

Comment on lines +333 to +335
if not toc_list:
raise ValueError(f'No CSV files found for: {ds_name}')
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets set the standard:

  • raising an error: it means something is really bad. In that case do we fallback to legacy Speedup?
  • Return a default DF with speedup 1.0: this is the case when the QualX says that there is no speedup because there are ceratin features, files/columns I cannot read/empty. Then the speedup is 1.0

For the case when it is "not toc_list", do we want to raise an error? or return 1.0 after logging that files were missing?
CC: @eordentlich and @parthosa

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amahussein this should be a very rare case. Basically, the user has requested to load CSV files for a dataset, and we have found none at all for the entire dataset. Previously, we would just skip this dataset (failing silently), so added this code to raise it to the user's attention. Presumably, the legacy Speedup would also fail in this case too (or would it just return 1.0?).

Comment on lines +1080 to +1087
node_level_supp['Exec Is Supported'] = (
node_level_supp['Exec Is Supported']
| node_level_supp['Exec Name'].apply(
lambda x: any([x.startswith(nm) for nm in unsupported_overrides])
)
| node_level_supp['Exec Name'].apply(
lambda x: x.startswith('WholeStageCodegen')
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that change was rolled back..Just double checking.

Copy link
Collaborator Author

@leewyang leewyang Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per @eordentlich's commit to internal repo, this is a workaround for #860.

Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge that to unblock then followup later on whether to raise-error or not in subsequent changes.

@parthosa parthosa merged commit 5af161f into NVIDIA:dev Jun 6, 2024
16 checks passed
@leewyang leewyang deleted the qualx_sync branch June 6, 2024 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user_tools Scope the wrapper module running CSP, QualX, and reports (python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants