Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shap command to internal CLI for debugging #1197

Merged
merged 2 commits into from
Jul 18, 2024

Conversation

leewyang
Copy link
Collaborator

This PR adds a shap command to the internal CLI to help explain a specific (per-sql) XGBoost prediction.

Usage:

python qualx_main.py shap --help

Example:

python --platform $PLATFORM \
--prediction_output /path/to/prediction/output \
--index 0
# --model $MODEL   # optional

# --index should be a numeric zero-based index pointing to a specific line (i.e. sqlID) in the `shap_values.csv` file.
# Each line in this file corresponds to the same line (sqlID) in the `per_sql.csv` file.

The output of the command looks like:

+-----+-------------------------------------------------+--------------+--------------+--------------------+--------------+-------------+-------------+-------------+-------------+-------------+-------------+-----------------+----------------+
|     | feature                                         |   shap_value |   model_rank |   model_shap_value |   train_mean |   train_std |   train_min |   train_25% |   train_50% |   train_75% |   train_max |   feature_value | out_of_range   |
|-----+-------------------------------------------------+--------------+--------------+--------------------+--------------+-------------+-------------+-------------+-------------+-------------+-------------+-----------------+----------------|
|   0 | executorCPUTime_mean                            |      -0.1192 |            0 |             0.1927 |      1.8e+03 |     5.1e+03 |     7.0e+01 |     2.6e+02 |     6.0e+02 |     1.2e+03 |     6.1e+04 |         4.9e+02 | False          |
|   1 | sw_bytesWrittenRatio                            |       0.0478 |            7 |             0.0220 |      9.6e-01 |     1.3e+00 |     1.2e-06 |     6.4e-03 |     3.2e-01 |     1.8e+00 |     1.2e+01 |         2.4e+00 | False          |
|   2 | executorDeserializeCPUTime_mean                 |      -0.0475 |            5 |             0.0237 |      6.7e+00 |     3.1e+00 |     2.1e+00 |     5.6e+00 |     6.2e+00 |     7.3e+00 |     3.7e+01 |         3.9e+01 | True           |
|   3 | sw_recordsWritten_sum                           |      -0.0339 |            1 |             0.0711 |      1.5e+09 |     3.8e+09 |     2.6e+02 |     1.7e+06 |     7.0e+07 |     8.7e+08 |     2.4e+10 |         1.2e+08 | False          |
...
| 106 | sqlOp_CommandResult                             |       0.0000 |          106 |             0.0000 |      0.0e+00 |     0.0e+00 |     0.0e+00 |     0.0e+00 |     0.0e+00 |     0.0e+00 |     0.0e+00 |         0.0e+00 | False          |
| 107 | sqlOp_WindowSort                                |       0.0000 |          107 |             0.0000 |      0.0e+00 |     0.0e+00 |     0.0e+00 |     0.0e+00 |     0.0e+00 |     0.0e+00 |     0.0e+00 |         0.0e+00 | False          |
+-----+-------------------------------------------------+--------------+--------------+--------------------+--------------+-------------+-------------+-------------+-------------+-------------+-------------+-----------------+----------------+
Shap base value: 0.4152
Shap values sum: -0.0120
Shap prediction: 0.4032
exp(prediction): 1.4965

Where:

  • the features are listed in order of importance (absolute value of shap_value), similar to a SHAP waterfall plot.
  • model_rank shows the feature importance rank on the training set.
  • model_shap_value shows the feature shap_value on the training set.
  • train_[mean|std|min|max] show the mean, standard deviation, min and max values of the feature in the training set.
  • train_[25%|50%|75%] show the feature value at the respective percentile in the training set.
  • feature_value shows the value of the feature used in prediction (for the indexed row/sqlID).
  • out_of_range indicates if the feature_value used in prediction was outside of the range of values seen in the training set.
  • Shap base value is the model's average prediction across the entire training set.
  • Shap values sum is the sum of the shap_value column for this indexed instance.
  • Shap prediction is the sum of Shap base value and Shap values sum, representing the model's predicted value.
  • exp(prediction) is the exponential of Shap prediction, which represents the predicted speedup (since the XGBoost model currently predicts log(speedup)).
  • the predicted speedup (which should match y_pred in per_sql.csv) is applied to the "supported" durations and combined with the unsupported" durations to produce a final per-sql speedup (speedup_pred in per_sql.csv).

Changes

  1. Added features.csv to save the feature values used for prediction.
  2. Moved the current shap_values.csv to feature_importance.csv (which is more descriptive of its purpose).
  3. Used shap_values.csv to save all of the shap values per feature per instance/sqlID during prediction.
  4. Saved a model.metrics file (for each model) during training to store the feature shap values and distribution metrics of the training set.
  5. Renamed the model.json.cfg files to model.cfg to avoid the double-suffix.
  6. Refactored/combined the compute_feature_importance and compute_shapley_values functions.
  7. Updated internal predict CLI to support --qual_output argument.
  8. Added shap command to internal CLI, which joins the prediction shap_values w/ training shap_values and distribution metrics.

Test

Following CMDs have been tested:

External Usage:

spark-rapids train
spark-rapids predict

Internal Usage:

python qualx_main.py preprocess
python qualx_main.py train
python qualx_main.py predict
python qualx_main.py shap

@leewyang leewyang self-assigned this Jul 17, 2024
@leewyang leewyang added the user_tools Scope the wrapper module running CSP, QualX, and reports (python) label Jul 17, 2024
@amahussein amahussein added the affect-output A change that modifies the output (add/remove/rename files, add/remove/rename columns) label Jul 17, 2024
@amahussein amahussein requested a review from parthosa July 17, 2024 13:48
@amahussein amahussein added feature request New feature or request and removed affect-output A change that modifies the output (add/remove/rename files, add/remove/rename columns) labels Jul 17, 2024
Copy link
Collaborator

@parthosa parthosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @leewyang. Tested the changes. Made some minor comments.

Signed-off-by: Lee Yang <[email protected]>
Copy link
Collaborator

@parthosa parthosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @leewyang. LGTM

@leewyang leewyang merged commit bdf18d4 into NVIDIA:dev Jul 18, 2024
14 checks passed
@leewyang leewyang deleted the qualx_shap branch July 18, 2024 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request user_tools Scope the wrapper module running CSP, QualX, and reports (python)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants