Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement valor_core to compute metrics locally via numpy #651

Merged
merged 115 commits into from
Aug 22, 2024

Conversation

ntlind
Copy link
Contributor

@ntlind ntlind commented Jul 3, 2024

Improvements

  • Create a local package, valor_core, which can evaluate classification and object detection tasks locally in numpy without using postgres. It defines two functions, evaluate_classification and evaluate_detection, which take GroundTruth and Prediction objects and produce an Evaluation object that's equivalent to what's created from the API + client today.
  • Add the ability to precompute IOUs and run evaluations using ValorDetectionManager
  • Vectorize backend operations instead of looping over labels / label keys

Testing

  • evaluate_detection and evaluate_classification pass all integration and functional tests from Valor; these tests are implemented in core/test/. Current test coverage is 91%.
  • core/benchmarks/ contains equivalent benchmark scripts to what's used for valor
  • Two GH workflows were set up to mirror our test coverage and benchmarking checks

Next Steps in Future PRs

  • Implement ValorClassificationManager
  • Implement evaluate_segmentation and ValorSegmentationManager

Usage Examples from core/README.md

Passing Lists of GroundTruth and Prediction Objects

The first way to use valor_core is to pass a list of groundtruth and prediction objects to an evaluate_... function, like so:

groundtruths = [
    schemas.GroundTruth(
            datum=img1,
            annotations=...
     ), …
]
predictions = [
    schemas.Prediction(
            datum=img1,
            annotations=...
     ), …
]

evaluation = evaluate_detection(
        groundtruths=groundtruths,
        predictions=predictions,
        metrics_to_return=[
            enums.MetricType.AP,
            enums.MetricType.AR,
            enums.MetricType.mAP,
            enums.MetricType.APAveragedOverIOUs,
            enums.MetricType.mAR,
            enums.MetricType.mAPAveragedOverIOUs,
            enums.MetricType.PrecisionRecallCurve,
            enums.MetricType.DetailedPrecisionRecallCurve,
        ],
        pr_curve_iou_threshold=0.5,
        pr_curve_max_examples=1,
    )

Passing DataFrames

The second way to use valor_core is to pass in a dataframe of groundtruths and predictions:

groundtruth_df = pd.DataFrame(
        [
            {
                "datum_id": 1,
                "datum_uid": "uid1",
                "id": 1,
                "annotation_id": 1,
                "label_id": 1,
                "label_key": "k1",
                "label_value": "v1",
                "is_instance": True,
                "grouper_key": "k1",
                "polygon": schemas.Polygon.from_dict(
                    {
                        "type": "Polygon",
                        "coordinates": [
                            [[10, 10], [60, 10], [60, 40], [10, 40], [10, 10]]
                        ],
                    }
                ),
                "raster": None,
                "bounding_box": None,
            },
            {
                "datum_id": 1,
                "datum_uid": "uid1",
                "id": 2,
                "annotation_id": 2,
                "label_id": 2,
                "label_key": "k2",
                "label_value": "v2",
                "is_instance": True,
                "grouper_key": "k2",
                "polygon": schemas.Polygon.from_dict(
                    {
                        "type": "Polygon",
                        "coordinates": [
                            [
                                [87, 10],
                                [158, 10],
                                [158, 820],
                                [87, 820],
                                [87, 10],
                            ]
                        ],
                    }
                ),
                "raster": None,
                "bounding_box": None,
            },
            {
                "datum_id": 2,
                "datum_uid": "uid2",
                "id": 3,
                "annotation_id": 3,
                "label_id": 1,
                "label_key": "k1",
                "label_value": "v1",
                "is_instance": True,
                "grouper_key": "k1",
                "polygon": schemas.Polygon.from_dict(
                    {
                        "type": "Polygon",
                        "coordinates": [
                            [[15, 0], [70, 0], [70, 20], [15, 20], [15, 0]]
                        ],
                    }
                ),
                "raster": None,
                "bounding_box": None,
            },
        ]
)
prediction_df = pd.DataFrame(
    [
        {
            "id": 1,
            "annotation_id": 4,
            "score": 0.3,
            "datum_id": 1,
            "datum_uid": "uid1",
            "label_id": 1,
            "label_key": "k1",
            "label_value": "v1",
            "is_instance": True,
            "grouper_key": "k1",
            "polygon": schemas.Polygon.from_dict(
                {
                    "type": "Polygon",
                    "coordinates": [
                        [[10, 10], [60, 10], [60, 40], [10, 40], [10, 10]]
                    ],
                }
            ),
            "raster": None,
            "bounding_box": None,
        },
        {
            "id": 2,
            "annotation_id": 5,
            "score": 0.98,
            "datum_id": 2,
            "datum_uid": "uid2",
            "label_id": 2,
            "label_key": "k2",
            "label_value": "v2",
            "is_instance": True,
            "grouper_key": "k2",
            "polygon": schemas.Polygon.from_dict(
                {
                    "type": "Polygon",
                    "coordinates": [
                        [[15, 0], [70, 0], [70, 20], [15, 20], [15, 0]]
                    ],
                }
            ),
            "raster": None,
            "bounding_box": None,
        },
    ]
)

evaluation = evaluate_detection(
        groundtruths=groundtruth_df,
        predictions=prediction_df,
        metrics_to_return=[
            enums.MetricType.AP,
            enums.MetricType.AR,
            enums.MetricType.mAP,
            enums.MetricType.APAveragedOverIOUs,
            enums.MetricType.mAR,
            enums.MetricType.mAPAveragedOverIOUs,
            enums.MetricType.PrecisionRecallCurve,
            enums.MetricType.DetailedPrecisionRecallCurve,
        ],
        pr_curve_iou_threshold=0.5,
        pr_curve_max_examples=1,
    )

Using a Data Manager

Finally, you can use a manager class (i.e., ValorDetectionManager) to run your evaluation. The advantage to using a manager class is a) you won't have to keep all annotation types in memory in a large list and b) we can pre-compute certain columns (i.e., iou) in advance of the .evaluate() call.

manager = valor_core.ValorDetectionManager(...)
img1 = schemas.Datum(
        uid="uid1",
        metadata={
            "height": image_height,
            "width": image_width,
        },
    )
groundtruths = [
    schemas.GroundTruth(
            datum=img1,
            annotations=...
     ), …
]
predictions = [
    schemas.Prediction(
            datum=img1,
            annotations=...
     ), …
]


# the user passes a list of all groundtruths and predictions for a list of datums
# this allows us to precompute IOUs at the datum_uid + label_key level
manager.add_data(groundtruths=groundtruths, predictions=predictions)

# the user calls .evaluate() to compute the evaluation
evaluation = manager.evaluate()

# the user must pass all groundtruths and predictions for a given datum at once
# this restriction makes it so we can compute IOUs right away and throw away excess info like rasters, saving a significant amount of memory
with pytest.raises(ValueError):
    manager.add_data_for_datum(groundtruths=groundtruths, predictions=predictions) # throws error since img1 has already been added to the manager's data

# the user must also specify the label map, `convert_annotation_to_type`, etc. when instantiating the object
# once set, these attributes can't be changed since subsequent IOU calculations will become apples-to-oranges with prior calculations
with pytest.raises(ValueError):
    manager.label_map = some_label_map # throws an error since label map can't be changed, only instantiated

core/valor_core/classification.py Outdated Show resolved Hide resolved
core/valor_core/classification.py Outdated Show resolved Hide resolved
core/README.md Outdated Show resolved Hide resolved
core/valor_core/metrics.py Outdated Show resolved Hide resolved
core/valor_core/schemas.py Outdated Show resolved Hide resolved
core/valor_core/schemas.py Outdated Show resolved Hide resolved
@ntlind ntlind merged commit c996f6d into main Aug 22, 2024
14 checks passed
@ntlind ntlind deleted the compute_local_metrics branch August 22, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants