Evaluation results class for easier access to results #1326

elronbandel · 2024-11-03T15:27:47Z

No description provided.

Signed-off-by: elronbandel <[email protected]>

yoavkatz · 2024-11-03T19:23:20Z

src/unitxt/metric_utils.py

@@ -320,6 +320,30 @@ def prepare(self):
 )


+class EvaluationResults(list):
+    @property
+    def score(self):


I think "score" is a problematic name (should have been "scores"). Do we want to start change it here. We could "global_scores", "group_scores" and "subset_scores", "instance_scores" attributes. Slowly we will move off the list and use only the attributes.

I think the most useful should be the shortest most natural name. So assuming it is used like this: results.score the most natural way is to call it something like results.summary

The slow movement from list can start by using to_list() for most cases and examples that actually access the list.

It is worthwhile to discuss the names, as they stay with us for a long time.

"result.score" sounds like it returns a single number. Note the result also contains instances which are not only score.

result.scores -
result.subset_scores
result.group_scores
result.instances - (prediction, references, task data, and scores)

yoavkatz · 2024-11-03T19:24:13Z

src/unitxt/metric_utils.py

+    @property
+    def groups(self):
+        if "groups" not in self[0]["score"]:
+            raise ValueError("Groups scores not found try using group_by in the recipe")


This comment is not clear. We should have a UnitxtError("No groups were defined using 'group_by' in the recipe. For more information ....) and point the
documentation

yoavkatz · 2024-11-04T05:41:00Z

src/unitxt/metric_utils.py

+    @property
+    def subsets(self):
+        if "subsets" not in self[0]["score"]:
+            raise ValueError("Subsets scores not found try using Benchmark")


We should have a UnitxtError("No subsets were defined using using the Benchmark in the recipe. For more information ....) and point the
documentation

yoavkatz · 2024-11-04T05:42:49Z

src/unitxt/metric_utils.py

+        import pandas as pd
+
+        # Flatten and load into DataFrame
+        return pd.json_normalize(self)


How does the output look like? Maybe since it's a one line, we don't need it. As it might make sense to convert different scores (global, subsets) seperately,

yoavkatz · 2024-11-04T05:49:20Z

I think that as part of this change (once it's finalized) we should change all the example and relevant documentation to use it - so people will get used to the new way of working.

Evaluation results class for easier access to results

d59826f

Signed-off-by: elronbandel <[email protected]>

yoavkatz reviewed Nov 3, 2024

View reviewed changes

yoavkatz reviewed Nov 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation results class for easier access to results #1326

Evaluation results class for easier access to results #1326

elronbandel commented Nov 3, 2024

yoavkatz Nov 3, 2024

elronbandel Nov 3, 2024

yoavkatz Nov 4, 2024 •

edited

Loading

yoavkatz Nov 3, 2024

elronbandel Nov 3, 2024

yoavkatz Nov 4, 2024

yoavkatz Nov 4, 2024

yoavkatz commented Nov 4, 2024

Evaluation results class for easier access to results #1326

Are you sure you want to change the base?

Evaluation results class for easier access to results #1326

Conversation

elronbandel commented Nov 3, 2024

yoavkatz Nov 3, 2024

Choose a reason for hiding this comment

elronbandel Nov 3, 2024

Choose a reason for hiding this comment

yoavkatz Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

yoavkatz Nov 3, 2024

Choose a reason for hiding this comment

elronbandel Nov 3, 2024

Choose a reason for hiding this comment

yoavkatz Nov 4, 2024

Choose a reason for hiding this comment

yoavkatz Nov 4, 2024

Choose a reason for hiding this comment

yoavkatz commented Nov 4, 2024

yoavkatz Nov 4, 2024 •

edited

Loading