Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement valor_core to compute metrics locally via numpy #651

Merged
merged 115 commits into from
Aug 22, 2024
Merged
Show file tree
Hide file tree
Changes from 105 commits
Commits
Show all changes
115 commits
Select commit Hold shift + click to select a range
d7c8edf
working through integration tests
ntlind Jul 3, 2024
abf6075
Merge branch 'main' into compute_local_metrics
ntlind Jul 5, 2024
18528d6
pass functional tests after merge conflicts
ntlind Jul 5, 2024
0334741
pass non-ROC integration tests
ntlind Jul 5, 2024
56767df
add ROCAUC metric
ntlind Jul 6, 2024
d0f0df0
pass a few more integration tests
ntlind Jul 8, 2024
f1d4b5a
pass integration tests
ntlind Jul 8, 2024
b0238c3
Merge branch 'main' into compute_local_metrics
ntlind Jul 9, 2024
a1422ce
minor cleanup
ntlind Jul 9, 2024
17dbb00
update npm test
ntlind Jul 9, 2024
2e39ade
up benchmarks
ntlind Jul 9, 2024
1f4c354
update benchmarks
ntlind Jul 9, 2024
026094a
pass first set of AR and AP tests
ntlind Jul 10, 2024
8e9828a
add curves
ntlind Jul 11, 2024
3869b27
fix OD iou calculation for rasters
ntlind Jul 11, 2024
9cd1654
pass functional tests
ntlind Jul 11, 2024
1fa240c
fix groundtruths with no predictions
ntlind Jul 12, 2024
3e1f438
Merge branch 'main' into compute_local_metrics
ntlind Jul 12, 2024
62a660a
fix gts in PR output
ntlind Jul 12, 2024
39466db
pass more integration tests, edge cases still outstanding
ntlind Jul 12, 2024
b75087b
pass integration tests
ntlind Jul 16, 2024
dd19fff
small benchmarking script changes
ntlind Jul 16, 2024
87930e2
clean up and add aggregate OD functions
ntlind Jul 16, 2024
479dab0
remove deletion
ntlind Jul 16, 2024
66b47b2
adjust benchmarks
ntlind Jul 16, 2024
6dc83bc
update od benchmarks
ntlind Jul 17, 2024
e5e27a7
save progress on detailed pr curves; doesn't pass tests
ntlind Jul 19, 2024
ee31b93
finish detailed PR curves for OD
ntlind Jul 24, 2024
172a2bc
refactor classification
ntlind Jul 24, 2024
3189d7d
refactor classification and pass all tests
ntlind Jul 24, 2024
c591438
finish pr curves for classification
ntlind Jul 25, 2024
c993696
fix benchmarks
ntlind Jul 25, 2024
0d45f6c
PR optimizations
ntlind Jul 26, 2024
19729a1
increase limit to 500 datums
ntlind Jul 29, 2024
a85619f
initial commit of /core
ntlind Jul 29, 2024
2659990
pass first integration tests
ntlind Jul 30, 2024
6028f9a
get object -> dataframe converter working
ntlind Jul 30, 2024
10b9002
get object -> dataframe converter working
ntlind Jul 30, 2024
4877761
finish adapting integration tests for classification
ntlind Jul 31, 2024
9ba360f
pass functional and integration tests for classification
ntlind Aug 1, 2024
efa71ce
add first two detection integration tests
ntlind Aug 2, 2024
4026a3a
finish integration tests for detection
ntlind Aug 2, 2024
57c52f8
pass all functional and integration tests locally
ntlind Aug 2, 2024
790c85c
Merge branch 'main' into compute_local_metrics
ntlind Aug 3, 2024
dc5f342
revert API changse
ntlind Aug 3, 2024
405271a
resolve warnings and type errors
ntlind Aug 5, 2024
ef8a562
Merge branch 'main' into compute_local_metrics
ntlind Aug 5, 2024
609834b
revert changes to non-core files
ntlind Aug 5, 2024
a7c6117
fix some TODOs
ntlind Aug 5, 2024
576a39e
fix black error
ntlind Aug 5, 2024
cd5ec8f
add testing and benchmarking scripts, init files
ntlind Aug 5, 2024
301ae9d
fix core benchmarks workflow
ntlind Aug 5, 2024
40866de
add missing packages
ntlind Aug 5, 2024
e8a635c
increase python version for core testing
ntlind Aug 5, 2024
684e4b5
fix workflows
ntlind Aug 5, 2024
71b685e
tweak core test coverage workflow
ntlind Aug 5, 2024
a8c8fdd
tweak core test workflow
ntlind Aug 5, 2024
b82667f
tweak test workflow
ntlind Aug 5, 2024
454f855
add unit-tests
ntlind Aug 6, 2024
86ee138
remove dataset_name
ntlind Aug 6, 2024
a3ae701
move arguments outside of EvaluationParameters
ntlind Aug 6, 2024
57a173d
add raster conversion logic
ntlind Aug 6, 2024
85d07fd
deep dive into conversions and get benchmarks to run correctly
ntlind Aug 7, 2024
19db470
Merge branch 'main' into compute_local_metrics
ntlind Aug 7, 2024
75985ff
handle TODOs
ntlind Aug 7, 2024
5682cf4
add test changes from #684
ntlind Aug 7, 2024
cc89781
add validation checks
ntlind Aug 7, 2024
db5470e
refactor
ntlind Aug 7, 2024
3988e76
increase test coverage
ntlind Aug 8, 2024
7766fea
add docstrings
ntlind Aug 8, 2024
6b40f05
remove height/width
ntlind Aug 8, 2024
686bbbc
Merge branch 'main' into compute_local_metrics
ntlind Aug 8, 2024
d38e394
kick off benchmark runner
ntlind Aug 8, 2024
241d8cd
reset OD benchmarks
ntlind Aug 8, 2024
690eb06
change folder structure
ntlind Aug 8, 2024
8e16154
update _post_init docstrings
ntlind Aug 8, 2024
9a43bda
reset API benchmarks
ntlind Aug 8, 2024
d2588c3
add getting started example notebook
ntlind Aug 9, 2024
4f5947b
fix getting started notebook
ntlind Aug 9, 2024
a72c855
modify .gitignore
ntlind Aug 9, 2024
34bd664
incorporate feedback
ntlind Aug 9, 2024
d4a426b
add dataframe descriptions
ntlind Aug 9, 2024
a8299ef
add back parameters
ntlind Aug 9, 2024
876930c
apply label map instead of using grouper_mappings
ntlind Aug 9, 2024
b3300a3
move Raster, etc. into schemas
ntlind Aug 9, 2024
967225e
disallow rotated and skewed boxes
ntlind Aug 9, 2024
497dfc6
Update getting_started.ipynb
ntlind Aug 9, 2024
c1c0e46
Update getting_started.ipynb
ntlind Aug 9, 2024
cd7af06
add readme
ntlind Aug 9, 2024
2ab7ab9
delete polygons. remove geometry from raster
ntlind Aug 13, 2024
03078ef
add back from_geometry and coordinates methods
ntlind Aug 13, 2024
1baddcc
add more polygon tests
ntlind Aug 14, 2024
d00cab1
refactor stuff without implementing ValorContext
ntlind Aug 14, 2024
72ec730
Merge branch 'main' into compute_local_metrics
ntlind Aug 14, 2024
355e46e
throw errors if the user tries disallowed conversions
ntlind Aug 14, 2024
ec3bebb
support rotated bounding boxes and add tests to /api and /core
ntlind Aug 15, 2024
c380fa0
add ValorDetectionManager; fix bug with examples
ntlind Aug 20, 2024
c23231a
fix labelmaptype bug
ntlind Aug 20, 2024
8c1f37e
Merge branch 'main' into compute_local_metrics
ntlind Aug 20, 2024
e2c7e6f
remove label map type
ntlind Aug 20, 2024
1349cef
undo api changes
ntlind Aug 20, 2024
b6e0adc
force test_detection precommit
ntlind Aug 20, 2024
10d40b6
add Charles' benchmark improvements
ntlind Aug 20, 2024
000471f
add tqdm as dep
ntlind Aug 20, 2024
5423253
update OD benchmarks
ntlind Aug 20, 2024
1ea9092
reduce the number of columns stored in joint_df table
ntlind Aug 21, 2024
9e606cd
reduce memory required for classification and OD tasks
ntlind Aug 21, 2024
6fc9d43
revert api chagne
ntlind Aug 21, 2024
b017b03
add Sean's suggestions; move replace_label_map outside of _compute_cl…
ntlind Aug 21, 2024
67ce1f4
incorporate Charles' feedback
ntlind Aug 21, 2024
955faf6
Update classification.py
ntlind Aug 21, 2024
4299148
edit docstrings. limit use of typing
ntlind Aug 21, 2024
c827704
bump python version
ntlind Aug 21, 2024
725ad1e
Merge branch 'main' into compute_local_metrics
ntlind Aug 22, 2024
24dfcd3
add tests for validating dfs
ntlind Aug 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Run benchmarks on pre-existing data
name: Run API + client benchmarks

on:
push:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Unit, functional, integration tests and code coverage
name: Run API + client code coverage report

on:
push:
Expand Down
38 changes: 38 additions & 0 deletions .github/workflows/core-benchmark-evaluations.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Run core benchmarks

on:
push:
branches: "**"

permissions:
id-token: write
contents: read

jobs:
run-benchmarks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: install core
run: pip install -e .
working-directory: ./core
- name: run classification benchmarks
run: python benchmark_script.py
working-directory: ./core/benchmarks/classification
- name: print classification results
run: |
export BENCHMARK_RESULTS=$(python -c "import os;import json;print(json.dumps(json.load(open('results.json', 'r')), indent=4));")
echo "$BENCHMARK_RESULTS"
working-directory: ./core/benchmarks/classification
- name: run object detection benchmarks
run: python benchmark_script.py
working-directory: ./core/benchmarks/object-detection
- name: print object detection results
run: |
export BENCHMARK_RESULTS=$(python -c "import os;import json;print(json.dumps(json.load(open('results.json', 'r')), indent=4));")
echo "$BENCHMARK_RESULTS"
working-directory: ./core/benchmarks/object-detection
- run: make stop-env
36 changes: 36 additions & 0 deletions .github/workflows/core-tests-and-coverage.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: Run core code coverage report

on:
push:
branches: "**"

permissions:
id-token: write
contents: read

jobs:
core-tests:
runs-on: ubuntu-latest
defaults:
run:
working-directory: .
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: run tests and report coverage
run: |
pip install -e ".[test]"
COVERAGE_FILE=.coverage.functional python -m coverage run --omit "tests/*" -m pytest -v tests/functional-tests
COVERAGE_FILE=.coverage.unit python -m coverage run --omit "tests/*" -m pytest -v tests/unit-tests
python -m coverage combine
python -m coverage report -m
python -m coverage json
export TOTAL=$(python -c "import json;print(json.load(open('coverage.json'))['totals']['percent_covered_display'])")
echo "total=$TOTAL" >> $GITHUB_ENV
if (( $TOTAL < 90 )); then
echo "Coverage is below 90%"
exit 1
fi
working-directory: ./core
7 changes: 4 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,19 +32,19 @@ repos:
rev: v1.1.376
hooks:
- id: pyright
additional_dependencies:
[
additional_dependencies: [
"requests",
"Pillow >= 9.1.0",
"numpy",
"pandas>=2.2.2",
"pandas-stubs", # fixes pyright issues with pandas
"pytest",
"python-dotenv",
"SQLAlchemy>=2.0",
"fastapi[all]>=0.100.0",
"importlib_metadata; python_version < '3.8'",
"pydantic-settings",
"tqdm",
"pandas",
"packaging",
"PyJWT[crypto]",
"structlog",
Expand All @@ -57,4 +57,5 @@ repos:
"nltk",
"rouge_score",
"evaluate",
"shapely",
]
90 changes: 47 additions & 43 deletions api/valor_api/backend/metrics/classification.py
ntlind marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -458,30 +458,34 @@ def search_datums(condition: ColumnElement[bool]):
else list()
)
fp = {
"misclassifications": [
unique_datums[datum_id] for datum_id in fp
]
if fp
else list()
"misclassifications": (
[unique_datums[datum_id] for datum_id in fp]
if fp
else list()
)
}
tn = (
[unique_datums[datum_id] for datum_id in tn]
if tn
else list()
)
fn = {
"misclassifications": [
unique_datums[datum_id]
for datum_id in fn_misclf_examples
]
if fn_misclf_examples
else list(),
"no_predictions": [
unique_datums[datum_id]
for datum_id in fn_misprd_examples
]
if fn_misprd_examples
else list(),
"misclassifications": (
[
unique_datums[datum_id]
for datum_id in fn_misclf_examples
]
if fn_misclf_examples
else list()
),
"no_predictions": (
[
unique_datums[datum_id]
for datum_id in fn_misprd_examples
]
if fn_misprd_examples
else list()
),
}

detailed_pr_output[key][value][float(threshold)] = {
Expand Down Expand Up @@ -789,18 +793,20 @@ def _compute_roc_auc(

label_keys = {key for key, _ in labels}
return [
schemas.ROCAUCMetric(
label_key=key,
value=(
float(np.mean(label_key_to_rocauc[key]))
if len(label_key_to_rocauc[key]) >= 1
else None
),
)
if (key in label_key_to_rocauc and key in predictions_label_keys)
else schemas.ROCAUCMetric(
label_key=key,
value=0.0,
(
schemas.ROCAUCMetric(
label_key=key,
value=(
float(np.mean(label_key_to_rocauc[key]))
if len(label_key_to_rocauc[key]) >= 1
else None
),
)
if (key in label_key_to_rocauc and key in predictions_label_keys)
else schemas.ROCAUCMetric(
label_key=key,
value=0.0,
)
)
for key in label_keys
]
Expand Down Expand Up @@ -997,20 +1003,18 @@ def _compute_confusion_matrices_and_metrics(
labels: dict[int, tuple[str, str]],
pr_curve_max_examples: int,
metrics_to_return: list[enums.MetricType],
) -> (
tuple[
list[schemas.ConfusionMatrix],
list[
schemas.AccuracyMetric
| schemas.ROCAUCMetric
| schemas.PrecisionMetric
| schemas.RecallMetric
| schemas.F1Metric
| schemas.PrecisionRecallCurve
| schemas.DetailedPrecisionRecallCurve
],
]
):
) -> tuple[
list[schemas.ConfusionMatrix],
list[
schemas.AccuracyMetric
| schemas.ROCAUCMetric
| schemas.PrecisionMetric
| schemas.RecallMetric
| schemas.F1Metric
| schemas.PrecisionRecallCurve
| schemas.DetailedPrecisionRecallCurve
],
]:
"""
Computes the confusion matrix and all metrics for a given label key.

Expand Down
21 changes: 21 additions & 0 deletions core/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023 Striveworks

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Loading
Loading