-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Classification Performance #637
Merged
Merged
Changes from 29 commits
Commits
Show all changes
42 commits
Select commit
Hold shift + click to select a range
2fb63f8
fixed base performance
czaloom 7501c28
classification fixes
czaloom 284c16b
added results
czaloom e70bb62
remove benmarks
czaloom 5846b99
pr curve performance improvements
czaloom 07fb5a5
pr curve performance improvements
czaloom f7048c1
perf improvements
czaloom 0df6437
fixed post timeouts
czaloom b5b6ffb
added timeout controls
czaloom 5d3aa2d
added vacuum analyze to dataset, model finalization
czaloom 0734f30
fixed for python 3_8
czaloom 57d35f2
fixed args
czaloom f2ff1a0
fixed lack of db error in testing
czaloom d444d12
fixed test
czaloom b090cc0
merged client timeout pr
czaloom 7506484
merged vacuum analyze pr
czaloom 0431989
passing precommit
czaloom f192d9e
removing commented code
czaloom 50282eb
fixed validate labels
czaloom 09b3476
validate matching label keys is more straightforward
czaloom 22f4d93
Merge branch 'czaloom-644-fix-validate_matching_label_keys' into czal…
czaloom 20eb80b
passing python integration tests
czaloom b2fa72c
remove comments
czaloom ba9810c
updated analysis.py
czaloom bf88ea7
Update test_classification.py
czaloom 5dfd9e7
Delete examples/benchmarks/analysis.py
czaloom a9b2036
Delete examples/benchmarks/results.json
czaloom 7bb793d
Delete examples/benchmarks/pr-curve-oom-data.json
czaloom fcedd4b
Update test_classification.py
czaloom 8299f44
added docstring
czaloom 83d0bdf
change default to 10 for creating gts and pds
czaloom 791f1d3
revert
czaloom 40e75c9
Merge branch 'czaloom-add-vacuum-analyze' into czaloom-patch-581-perf…
czaloom 2f17e3f
Merge branch 'czaloom-644-fix-validate_matching_label_keys' into czal…
czaloom d7dabf1
Merge branch 'czaloom-639-bug-bulk-add-error' into czaloom-patch-581-…
czaloom 45f2e58
Merge branch 'main' into czaloom-patch-581-performance-issues
czaloom 17c35ce
reverted test
czaloom fa84825
unrelated test failing due to list ordering in db
czaloom 7ece6c5
fix typo
czaloom bf3c1c5
merge main
czaloom 967e633
merged main
czaloom 5501b26
reverted integration tests
czaloom File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the point of the score threshold in that case?
the score threshold is meant to mean "only consider predictions with a score greater than x to be valid predictions"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of a score threshold is to determine whether the prediction is positive vs negative.
Whether that prediction is correct determines its truth (True, False).
Combine these and you get TP, FP, FN and TN.
The variation of
missing_detection
doesnt really map well to the classification task (as compared to the obj det task) as we enforce the existence of predictions to groundtruths at ingestion time. (Seevalidate_matching_label_keys
)This logic also applies to
hallucination
for FP, which, if you look at that test never gets a value counted.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reached out to Matt and I think this is a definition issue.
Missing detection
doesnt make sense for classification. The condition of FN that you are referring to fits something closer to a "no winner" condition.Matt suggested "No prediction" and im wondering if "Null Prediction" would make more sense.
How does all this sound to you?