Implement `valor_core` to compute metrics locally via numpy #651

ntlind · 2024-07-03T15:13:50Z

Improvements

Create a local package, valor_core, which can evaluate classification and object detection tasks locally in numpy without using postgres. It defines two functions, evaluate_classification and evaluate_detection, which take GroundTruth and Prediction objects and produce an Evaluation object that's equivalent to what's created from the API + client today.
Add the ability to precompute IOUs and run evaluations using ValorDetectionManager
Vectorize backend operations instead of looping over labels / label keys

Testing

evaluate_detection and evaluate_classification pass all integration and functional tests from Valor; these tests are implemented in core/test/. Current test coverage is 91%.
core/benchmarks/ contains equivalent benchmark scripts to what's used for valor
Two GH workflows were set up to mirror our test coverage and benchmarking checks

Next Steps in Future PRs

Implement ValorClassificationManager
Implement evaluate_segmentation and ValorSegmentationManager

Usage Examples from `core/README.md`

Passing Lists of GroundTruth and Prediction Objects

The first way to use valor_core is to pass a list of groundtruth and prediction objects to an evaluate_... function, like so:

groundtruths = [
    schemas.GroundTruth(
            datum=img1,
            annotations=...
     ), …
]
predictions = [
    schemas.Prediction(
            datum=img1,
            annotations=...
     ), …
]

evaluation = evaluate_detection(
        groundtruths=groundtruths,
        predictions=predictions,
        metrics_to_return=[
            enums.MetricType.AP,
            enums.MetricType.AR,
            enums.MetricType.mAP,
            enums.MetricType.APAveragedOverIOUs,
            enums.MetricType.mAR,
            enums.MetricType.mAPAveragedOverIOUs,
            enums.MetricType.PrecisionRecallCurve,
            enums.MetricType.DetailedPrecisionRecallCurve,
        ],
        pr_curve_iou_threshold=0.5,
        pr_curve_max_examples=1,
    )

Passing DataFrames

The second way to use valor_core is to pass in a dataframe of groundtruths and predictions:

groundtruth_df = pd.DataFrame(
        [
            {
                "datum_id": 1,
                "datum_uid": "uid1",
                "id": 1,
                "annotation_id": 1,
                "label_id": 1,
                "label_key": "k1",
                "label_value": "v1",
                "is_instance": True,
                "grouper_key": "k1",
                "polygon": schemas.Polygon.from_dict(
                    {
                        "type": "Polygon",
                        "coordinates": [
                            [[10, 10], [60, 10], [60, 40], [10, 40], [10, 10]]
                        ],
                    }
                ),
                "raster": None,
                "bounding_box": None,
            },
            {
                "datum_id": 1,
                "datum_uid": "uid1",
                "id": 2,
                "annotation_id": 2,
                "label_id": 2,
                "label_key": "k2",
                "label_value": "v2",
                "is_instance": True,
                "grouper_key": "k2",
                "polygon": schemas.Polygon.from_dict(
                    {
                        "type": "Polygon",
                        "coordinates": [
                            [
                                [87, 10],
                                [158, 10],
                                [158, 820],
                                [87, 820],
                                [87, 10],
                            ]
                        ],
                    }
                ),
                "raster": None,
                "bounding_box": None,
            },
            {
                "datum_id": 2,
                "datum_uid": "uid2",
                "id": 3,
                "annotation_id": 3,
                "label_id": 1,
                "label_key": "k1",
                "label_value": "v1",
                "is_instance": True,
                "grouper_key": "k1",
                "polygon": schemas.Polygon.from_dict(
                    {
                        "type": "Polygon",
                        "coordinates": [
                            [[15, 0], [70, 0], [70, 20], [15, 20], [15, 0]]
                        ],
                    }
                ),
                "raster": None,
                "bounding_box": None,
            },
        ]
)
prediction_df = pd.DataFrame(
    [
        {
            "id": 1,
            "annotation_id": 4,
            "score": 0.3,
            "datum_id": 1,
            "datum_uid": "uid1",
            "label_id": 1,
            "label_key": "k1",
            "label_value": "v1",
            "is_instance": True,
            "grouper_key": "k1",
            "polygon": schemas.Polygon.from_dict(
                {
                    "type": "Polygon",
                    "coordinates": [
                        [[10, 10], [60, 10], [60, 40], [10, 40], [10, 10]]
                    ],
                }
            ),
            "raster": None,
            "bounding_box": None,
        },
        {
            "id": 2,
            "annotation_id": 5,
            "score": 0.98,
            "datum_id": 2,
            "datum_uid": "uid2",
            "label_id": 2,
            "label_key": "k2",
            "label_value": "v2",
            "is_instance": True,
            "grouper_key": "k2",
            "polygon": schemas.Polygon.from_dict(
                {
                    "type": "Polygon",
                    "coordinates": [
                        [[15, 0], [70, 0], [70, 20], [15, 20], [15, 0]]
                    ],
                }
            ),
            "raster": None,
            "bounding_box": None,
        },
    ]
)

evaluation = evaluate_detection(
        groundtruths=groundtruth_df,
        predictions=prediction_df,
        metrics_to_return=[
            enums.MetricType.AP,
            enums.MetricType.AR,
            enums.MetricType.mAP,
            enums.MetricType.APAveragedOverIOUs,
            enums.MetricType.mAR,
            enums.MetricType.mAPAveragedOverIOUs,
            enums.MetricType.PrecisionRecallCurve,
            enums.MetricType.DetailedPrecisionRecallCurve,
        ],
        pr_curve_iou_threshold=0.5,
        pr_curve_max_examples=1,
    )

Using a Data Manager

Finally, you can use a manager class (i.e., ValorDetectionManager) to run your evaluation. The advantage to using a manager class is a) you won't have to keep all annotation types in memory in a large list and b) we can pre-compute certain columns (i.e., iou) in advance of the .evaluate() call.

manager = valor_core.ValorDetectionManager(...)
img1 = schemas.Datum(
        uid="uid1",
        metadata={
            "height": image_height,
            "width": image_width,
        },
    )
groundtruths = [
    schemas.GroundTruth(
            datum=img1,
            annotations=...
     ), …
]
predictions = [
    schemas.Prediction(
            datum=img1,
            annotations=...
     ), …
]


# the user passes a list of all groundtruths and predictions for a list of datums
# this allows us to precompute IOUs at the datum_uid + label_key level
manager.add_data(groundtruths=groundtruths, predictions=predictions)

# the user calls .evaluate() to compute the evaluation
evaluation = manager.evaluate()

# the user must pass all groundtruths and predictions for a given datum at once
# this restriction makes it so we can compute IOUs right away and throw away excess info like rasters, saving a significant amount of memory
with pytest.raises(ValueError):
    manager.add_data_for_datum(groundtruths=groundtruths, predictions=predictions) # throws error since img1 has already been added to the manager's data

# the user must also specify the label map, `convert_annotation_to_type`, etc. when instantiating the object
# once set, these attributes can't be changed since subsequent IOU calculations will become apples-to-oranges with prior calculations
with pytest.raises(ValueError):
    manager.label_map = some_label_map # throws an error since label map can't be changed, only instantiated

api/valor_api/backend/metrics/detection.py

core/tests/functional-tests/test_detection.py

core/valor_core/classification.py

core/valor_core/managers.py

api/valor_api/backend/metrics/classification.py

core/README.md

core/tests/functional-tests/test_classification.py

…f_metrics

core/valor_core/managers.py

core/valor_core/metrics.py

core/valor_core/schemas.py

core/valor_core/utilities.py

core/tests/functional-tests/test_detection_manager.py

ntlind added 26 commits July 3, 2024 09:13

working through integration tests

d7c8edf

Merge branch 'main' into compute_local_metrics

abf6075

pass functional tests after merge conflicts

18528d6

pass non-ROC integration tests

0334741

add ROCAUC metric

56767df

pass a few more integration tests

d0f0df0

pass integration tests

f1d4b5a

Merge branch 'main' into compute_local_metrics

b0238c3

minor cleanup

a1422ce

update npm test

17dbb00

up benchmarks

2e39ade

update benchmarks

1f4c354

pass first set of AR and AP tests

026094a

add curves

8e9828a

fix OD iou calculation for rasters

3869b27

pass functional tests

9cd1654

fix groundtruths with no predictions

1fa240c

Merge branch 'main' into compute_local_metrics

3e1f438

fix gts in PR output

62a660a

pass more integration tests, edge cases still outstanding

39466db

pass integration tests

b75087b

small benchmarking script changes

dd19fff

clean up and add aggregate OD functions

87930e2

remove deletion

479dab0

adjust benchmarks

66b47b2

update od benchmarks

6dc83bc

czaloom reviewed Jul 19, 2024

View reviewed changes

api/valor_api/backend/metrics/detection.py Outdated Show resolved Hide resolved

ntlind added 3 commits July 19, 2024 16:57

save progress on detailed pr curves; doesn't pass tests

e5e27a7

finish detailed PR curves for OD

ee31b93

refactor classification

172a2bc

update OD benchmarks

5423253

ntlind commented Aug 20, 2024

View reviewed changes

core/tests/functional-tests/test_detection.py Show resolved Hide resolved

ntlind added 3 commits August 20, 2024 22:38

reduce the number of columns stored in joint_df table

1ea9092

reduce memory required for classification and OD tasks

9e606cd

revert api chagne

6fc9d43

ntlind mentioned this pull request Aug 21, 2024

Fix object detection benchmarks #711

Closed

rsbowman-striveworks reviewed Aug 21, 2024

View reviewed changes

core/valor_core/classification.py Outdated Show resolved Hide resolved

core/valor_core/classification.py Outdated Show resolved Hide resolved