Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Classification Performance #637

Merged
merged 42 commits into from
Jul 3, 2024
Merged

Conversation

czaloom
Copy link
Collaborator

@czaloom czaloom commented Jun 27, 2024

Changes

  • Fixed a bug created in PR Add support for new filtering ops #581 where the default metric computation for classification was slowed down.
  • Refactored the PRCurve and DetailedPRCurve computation for classification.
    • PRCurve and DetailedPRCurve take the same time to compute and could be merged if desired.
    • Computation time is of the same magnitude as the time to compute default classification metrics

Performance

Before (v0.27.3)

{'limit': 10, 'ingest': '0.2871', 'base': '1.0894', 'base+pr': '1.0716', 'base+pr+detailed pr': '1.0793'}
{'limit': 10, 'ingest': '0.3532', 'base': '1.0686', 'base+pr': '1.0680', 'base+pr+detailed pr': '1.0741'}
{'limit': 100, 'ingest': '0.5643', 'base': '1.0755', 'base+pr': '2.0900', 'base+pr+detailed pr': '2.0856'}
{'limit': 100, 'ingest': '0.5666', 'base': '1.0735', 'base+pr': '2.0858', 'base+pr+detailed pr': '2.0892'}
{'limit': 1000, 'ingest': '4.7223', 'base': '32.5760', 'base+pr': '61.8423', 'base+pr+detailed pr': '62.1851'}
{'limit': 1000, 'ingest': '4.7074', 'base': '32.5851', 'base+pr': '60.7616', 'base+pr+detailed pr': '60.5008'}
{'limit': 5000, 'ingest': '23.0799', 'base': '762.0845', 'base+pr': '4081.4385', 'base+pr+detailed pr': '4062.1859'}

After

{'limit': 10, 'ingest': '0.2971', 'base': '1.0675', 'base+pr': '1.0577', 'base+pr+detailed pr': '1.0640'}
{'limit': 10, 'ingest': '0.2607', 'base': '1.0516', 'base+pr': '1.0505', 'base+pr+detailed pr': '1.0597'}
{'limit': 100, 'ingest': '0.5761', 'base': '1.0643', 'base+pr': '1.0632', 'base+pr+detailed pr': '1.0648'}
{'limit': 100, 'ingest': '0.5290', 'base': '1.0557', 'base+pr': '1.0657', 'base+pr+detailed pr': '1.0723'}
{'limit': 1000, 'ingest': '4.9709', 'base': '3.1396', 'base+pr': '4.2505', 'base+pr+detailed pr': '4.3072'}
{'limit': 1000, 'ingest': '4.9524', 'base': '3.1360', 'base+pr': '4.2255', 'base+pr+detailed pr': '5.4064'}
{'limit': 5000, 'ingest': '29.3421', 'base': '9.2979', 'base+pr': '15.8753', 'base+pr+detailed pr': '15.7665'}
{'limit': 5000, 'ingest': '28.7270', 'base': '11.3355', 'base+pr': '17.9243', 'base+pr+detailed pr': '18.0771'}

# showing consistency
{'limit': 5000, 'ingest': '28.2676', 'base': '7.3984', 'base+pr': '13.7927', 'base+pr+detailed pr': '13.8671'}
{'limit': 5000, 'ingest': '28.2710', 'base': '7.2600', 'base+pr': '13.7537', 'base+pr+detailed pr': '14.0668'}
{'limit': 5000, 'ingest': '27.7182', 'base': '7.2585', 'base+pr': '13.8196', 'base+pr+detailed pr': '13.8971'}
{'limit': 5000, 'ingest': '27.6027', 'base': '7.2584', 'base+pr': '13.6964', 'base+pr+detailed pr': '14.0638'}
{'limit': 5000, 'ingest': '27.8308', 'base': '9.4128', 'base+pr': '15.7377', 'base+pr+detailed pr': '15.7828'}
{'limit': 5000, 'ingest': '27.9315', 'base': '10.2762', 'base+pr': '16.9073', 'base+pr+detailed pr': '16.7292'}
{'limit': 5000, 'ingest': '27.8125', 'base': '11.3019', 'base+pr': '17.8417', 'base+pr+detailed pr': '17.8011'}
{'limit': 5000, 'ingest': '24.3069', 'base': '13.4752', 'base+pr': '19.8883', 'base+pr+detailed pr': '19.9296'}
{'limit': 5000, 'ingest': '27.9290', 'base': '15.4618', 'base+pr': '21.7758', 'base+pr+detailed pr': '21.8493'}
{'limit': 5000, 'ingest': '27.9097', 'base': '16.3366', 'base+pr': '22.9540', 'base+pr+detailed pr': '24.0885'}
{'limit': 5000, 'ingest': '27.8893', 'base': '18.4587', 'base+pr': '23.9570', 'base+pr+detailed pr': '24.8777'}

@czaloom czaloom added the bug Something isn't working label Jun 27, 2024
@czaloom czaloom self-assigned this Jun 27, 2024
@czaloom czaloom marked this pull request as ready for review June 27, 2024 16:30
@czaloom czaloom requested review from ntlind and ekorman as code owners June 27, 2024 16:30
@czaloom czaloom changed the title Default Classification Performance Patch Classification Performance Jul 1, 2024
@czaloom czaloom changed the title Patch Classification Performance Fix Classification Performance Jul 1, 2024
api/valor_api/backend/metrics/metric_utils.py Show resolved Hide resolved
@@ -1368,8 +1380,8 @@ def test__compute_curves(
},
("dog", 0.05, "tn"): {"all": 1, "total": 1},
("dog", 0.8, "fn"): {
"missed_detections": 1,
"misclassifications": 1,
"missed_detections": 0,
Copy link
Contributor

@ntlind ntlind Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A prediction having a score less than the threshold is still a valid prediction though

what is the point of the score threshold in that case?

the score threshold is meant to mean "only consider predictions with a score greater than x to be valid predictions"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of a score threshold is to determine whether the prediction is positive vs negative.

Whether that prediction is correct determines its truth (True, False).

Combine these and you get TP, FP, FN and TN.

The variation of missing_detection doesnt really map well to the classification task (as compared to the obj det task) as we enforce the existence of predictions to groundtruths at ingestion time. (See validate_matching_label_keys)

This logic also applies to hallucination for FP, which, if you look at that test never gets a value counted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reached out to Matt and I think this is a definition issue. Missing detection doesnt make sense for classification. The condition of FN that you are referring to fits something closer to a "no winner" condition.

Matt suggested "No prediction" and im wondering if "Null Prediction" would make more sense.

How does all this sound to you?

@@ -1368,8 +1380,8 @@ def test__compute_curves(
},
("dog", 0.05, "tn"): {"all": 1, "total": 1},
("dog", 0.8, "fn"): {
"missed_detections": 1,
"misclassifications": 1,
"missed_detections": 0,
Copy link
Contributor

@ntlind ntlind Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A prediction having a score less than the threshold is still a valid prediction though

what is the point of the score threshold in that case?

the score threshold is meant to mean "only consider predictions with a score greater than x to be valid predictions"

@czaloom czaloom merged commit f539c0d into main Jul 3, 2024
11 checks passed
@czaloom czaloom deleted the czaloom-patch-581-performance-issues branch July 3, 2024 19:31
@Striveworks Striveworks deleted a comment from czaloom Jul 11, 2024
@Striveworks Striveworks deleted a comment from czaloom Jul 11, 2024
@Striveworks Striveworks deleted a comment from czaloom Jul 11, 2024
@Striveworks Striveworks deleted a comment from czaloom Jul 11, 2024
@Striveworks Striveworks deleted a comment from czaloom Jul 11, 2024
@Striveworks Striveworks deleted a comment from czaloom Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants