Fix Classification Performance #637

czaloom · 2024-06-27T16:28:53Z

Changes

Fixed a bug created in PR Add support for new filtering ops #581 where the default metric computation for classification was slowed down.
Refactored the PRCurve and DetailedPRCurve computation for classification.
- PRCurve and DetailedPRCurve take the same time to compute and could be merged if desired.
- Computation time is of the same magnitude as the time to compute default classification metrics

Performance

Before (v0.27.3)

{'limit': 10, 'ingest': '0.2871', 'base': '1.0894', 'base+pr': '1.0716', 'base+pr+detailed pr': '1.0793'}
{'limit': 10, 'ingest': '0.3532', 'base': '1.0686', 'base+pr': '1.0680', 'base+pr+detailed pr': '1.0741'}
{'limit': 100, 'ingest': '0.5643', 'base': '1.0755', 'base+pr': '2.0900', 'base+pr+detailed pr': '2.0856'}
{'limit': 100, 'ingest': '0.5666', 'base': '1.0735', 'base+pr': '2.0858', 'base+pr+detailed pr': '2.0892'}
{'limit': 1000, 'ingest': '4.7223', 'base': '32.5760', 'base+pr': '61.8423', 'base+pr+detailed pr': '62.1851'}
{'limit': 1000, 'ingest': '4.7074', 'base': '32.5851', 'base+pr': '60.7616', 'base+pr+detailed pr': '60.5008'}
{'limit': 5000, 'ingest': '23.0799', 'base': '762.0845', 'base+pr': '4081.4385', 'base+pr+detailed pr': '4062.1859'}

After

{'limit': 10, 'ingest': '0.2971', 'base': '1.0675', 'base+pr': '1.0577', 'base+pr+detailed pr': '1.0640'}
{'limit': 10, 'ingest': '0.2607', 'base': '1.0516', 'base+pr': '1.0505', 'base+pr+detailed pr': '1.0597'}
{'limit': 100, 'ingest': '0.5761', 'base': '1.0643', 'base+pr': '1.0632', 'base+pr+detailed pr': '1.0648'}
{'limit': 100, 'ingest': '0.5290', 'base': '1.0557', 'base+pr': '1.0657', 'base+pr+detailed pr': '1.0723'}
{'limit': 1000, 'ingest': '4.9709', 'base': '3.1396', 'base+pr': '4.2505', 'base+pr+detailed pr': '4.3072'}
{'limit': 1000, 'ingest': '4.9524', 'base': '3.1360', 'base+pr': '4.2255', 'base+pr+detailed pr': '5.4064'}
{'limit': 5000, 'ingest': '29.3421', 'base': '9.2979', 'base+pr': '15.8753', 'base+pr+detailed pr': '15.7665'}
{'limit': 5000, 'ingest': '28.7270', 'base': '11.3355', 'base+pr': '17.9243', 'base+pr+detailed pr': '18.0771'}

# showing consistency
{'limit': 5000, 'ingest': '28.2676', 'base': '7.3984', 'base+pr': '13.7927', 'base+pr+detailed pr': '13.8671'}
{'limit': 5000, 'ingest': '28.2710', 'base': '7.2600', 'base+pr': '13.7537', 'base+pr+detailed pr': '14.0668'}
{'limit': 5000, 'ingest': '27.7182', 'base': '7.2585', 'base+pr': '13.8196', 'base+pr+detailed pr': '13.8971'}
{'limit': 5000, 'ingest': '27.6027', 'base': '7.2584', 'base+pr': '13.6964', 'base+pr+detailed pr': '14.0638'}
{'limit': 5000, 'ingest': '27.8308', 'base': '9.4128', 'base+pr': '15.7377', 'base+pr+detailed pr': '15.7828'}
{'limit': 5000, 'ingest': '27.9315', 'base': '10.2762', 'base+pr': '16.9073', 'base+pr+detailed pr': '16.7292'}
{'limit': 5000, 'ingest': '27.8125', 'base': '11.3019', 'base+pr': '17.8417', 'base+pr+detailed pr': '17.8011'}
{'limit': 5000, 'ingest': '24.3069', 'base': '13.4752', 'base+pr': '19.8883', 'base+pr+detailed pr': '19.9296'}
{'limit': 5000, 'ingest': '27.9290', 'base': '15.4618', 'base+pr': '21.7758', 'base+pr+detailed pr': '21.8493'}
{'limit': 5000, 'ingest': '27.9097', 'base': '16.3366', 'base+pr': '22.9540', 'base+pr+detailed pr': '24.0885'}
{'limit': 5000, 'ingest': '27.8893', 'base': '18.4587', 'base+pr': '23.9570', 'base+pr+detailed pr': '24.8777'}

…oom-patch-581-performance-issues

api/valor_api/backend/metrics/classification.py

api/valor_api/backend/metrics/metric_utils.py

ntlind · 2024-07-02T18:38:12Z

api/tests/functional-tests/backend/metrics/test_classification.py

@@ -1368,8 +1380,8 @@ def test__compute_curves(
        },
        ("dog", 0.05, "tn"): {"all": 1, "total": 1},
        ("dog", 0.8, "fn"): {
-            "missed_detections": 1,
-            "misclassifications": 1,
+            "missed_detections": 0,


A prediction having a score less than the threshold is still a valid prediction though

what is the point of the score threshold in that case?

the score threshold is meant to mean "only consider predictions with a score greater than x to be valid predictions"

The point of a score threshold is to determine whether the prediction is positive vs negative.

Whether that prediction is correct determines its truth (True, False).

Combine these and you get TP, FP, FN and TN.

The variation of missing_detection doesnt really map well to the classification task (as compared to the obj det task) as we enforce the existence of predictions to groundtruths at ingestion time. (See validate_matching_label_keys)

This logic also applies to hallucination for FP, which, if you look at that test never gets a value counted.

I reached out to Matt and I think this is a definition issue. Missing detection doesnt make sense for classification. The condition of FN that you are referring to fits something closer to a "no winner" condition.

Matt suggested "No prediction" and im wondering if "Null Prediction" would make more sense.

How does all this sound to you?

…ormance-issues

…oom-patch-581-performance-issues

…performance-issues

ntlind · 2024-07-02T18:38:12Z

api/tests/functional-tests/backend/metrics/test_classification.py

@@ -1368,8 +1380,8 @@ def test__compute_curves(
        },
        ("dog", 0.05, "tn"): {"all": 1, "total": 1},
        ("dog", 0.8, "fn"): {
-            "missed_detections": 1,
-            "misclassifications": 1,
+            "missed_detections": 0,


A prediction having a score less than the threshold is still a valid prediction though

what is the point of the score threshold in that case?

the score threshold is meant to mean "only consider predictions with a score greater than x to be valid predictions"

czaloom added 3 commits June 27, 2024 11:20

fixed base performance

2fb63f8

classification fixes

7501c28

added results

284c16b

czaloom added the bug Something isn't working label Jun 27, 2024

czaloom self-assigned this Jun 27, 2024

remove benmarks

e70bb62

czaloom marked this pull request as ready for review June 27, 2024 16:30

czaloom requested review from ntlind and ekorman as code owners June 27, 2024 16:30

czaloom added 17 commits June 27, 2024 17:08

pr curve performance improvements

5846b99

pr curve performance improvements

07fb5a5

perf improvements

f7048c1

fixed post timeouts

0df6437

added timeout controls

b5b6ffb

added vacuum analyze to dataset, model finalization

5d3aa2d

fixed for python 3_8

0734f30

fixed args

57d35f2

fixed lack of db error in testing

f2ff1a0

fixed test

d444d12

merged client timeout pr

b090cc0

merged vacuum analyze pr

7506484

passing precommit

0431989

removing commented code

f192d9e

fixed validate labels

50282eb

validate matching label keys is more straightforward

09b3476

Merge branch 'czaloom-644-fix-validate_matching_label_keys' into czal…

22f4d93

…oom-patch-581-performance-issues

czaloom changed the title ~~Default Classification Performance~~ Patch Classification Performance Jul 1, 2024

czaloom changed the title ~~Patch Classification Performance~~ Fix Classification Performance Jul 1, 2024

czaloom added 2 commits July 1, 2024 15:54

passing python integration tests

20eb80b

remove comments

b2fa72c

czaloom added 5 commits July 1, 2024 17:05

Update test_classification.py

bf88ea7

Delete examples/benchmarks/analysis.py

5dfd9e7

Delete examples/benchmarks/results.json

a9b2036

Delete examples/benchmarks/pr-curve-oom-data.json

7bb793d

Update test_classification.py

fcedd4b

czaloom mentioned this pull request Jul 1, 2024

ENH: Improve Object Detection PR Curve #646

Closed

3 tasks

czaloom added 2 commits July 2, 2024 10:43

added docstring

8299f44

change default to 10 for creating gts and pds

83d0bdf

ntlind reviewed Jul 2, 2024

View reviewed changes

czaloom and others added 5 commits July 2, 2024 11:04

revert

791f1d3

Merge branch 'czaloom-add-vacuum-analyze' into czaloom-patch-581-perf…

40e75c9

…ormance-issues

Merge branch 'czaloom-644-fix-validate_matching_label_keys' into czal…

2f17e3f

…oom-patch-581-performance-issues

Merge branch 'czaloom-639-bug-bulk-add-error' into czaloom-patch-581-…

d7dabf1

…performance-issues

Merge branch 'main' into czaloom-patch-581-performance-issues

45f2e58

ntlind reviewed Jul 2, 2024

View reviewed changes

czaloom added 6 commits July 3, 2024 09:21

reverted test

17c35ce

unrelated test failing due to list ordering in db

fa84825

fix typo

7ece6c5

merge main

bf3c1c5

merged main

967e633

reverted integration tests

5501b26

ntlind approved these changes Jul 3, 2024

View reviewed changes

czaloom merged commit f539c0d into main Jul 3, 2024
11 checks passed

czaloom deleted the czaloom-patch-581-performance-issues branch July 3, 2024 19:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Classification Performance #637

Fix Classification Performance #637

czaloom commented Jun 27, 2024 •

edited

Loading

ntlind Jul 2, 2024 •

edited

Loading

czaloom Jul 2, 2024

czaloom Jul 2, 2024

ntlind Jul 2, 2024 •

edited

Loading

Fix Classification Performance #637

Fix Classification Performance #637

Conversation

czaloom commented Jun 27, 2024 • edited Loading

Changes

Performance

Before (v0.27.3)

After

ntlind Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

czaloom Jul 2, 2024

Choose a reason for hiding this comment

czaloom Jul 2, 2024

Choose a reason for hiding this comment

ntlind Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

czaloom commented Jun 27, 2024 •

edited

Loading

ntlind Jul 2, 2024 •

edited

Loading

ntlind Jul 2, 2024 •

edited

Loading