Support N-Level judgements (instead of 3) #109

anthonygroves · 2020-02-12T18:07:25Z

Currently, RRE only supports a three-level judgement scale. But docs here mention that this will likely be generalized in future versions: https://github.com/SeaseLtd/rated-ranking-evaluator/wiki/What%20We%20Need%20To%20Provide

My personal need is to use RRE with a judgement scale of 4 or 5, but I can see how others may want it to support any level up to 10. It would be nice if the RRE user could easily configure which N-level judgement scale to use, up to a reasonable (10?) value.

agazzarini · 2020-02-12T20:22:14Z

Thanks for entering this @anthonygroves. I completely agree with your need and with the generalisation of the judgement scale.

epugh · 2020-03-10T07:16:55Z

I wanted to mention that I am looking for a 4 point scale versus three as well. I think I can contribute some development effort if you can maybe give me some pointers?

agazzarini · 2020-03-10T21:12:55Z

Hi @epugh, judgments are gathered in

io.sease.rre.Func::gainOrRatingNode

and looking at its usage I see

As you can see callers are basically metrics and Query class.
The only doubts I have are:

is it better to configure the scale somewhere (e.g. pom.xml) or to have a preliminary scan of the judgments in order to automatically determine the scale? Or both (e.g. if the configuration is not found then it does the automatic thing)
the current code assigns 2 (the average) in case the rating is not present. Does the average make always sense?

mattflax · 2020-03-12T17:07:50Z

Hi @agazzarini,

@epugh asked me to take a look at this. My thoughts would be:

set constant default values of maxgrade=3, default=2 where the judgement is missing (matching current behaviour);
add the max grade as an optional construction parameter for NDCG, RR (and make it optional for ERR);
add the default grade (average) as an optional construction parameter for ERR, NDCG, RR (ERR currently calculates it as maxgrade/2, rounded to the nearest int).

This allows overriding all defaults using the ParameterizedMetric mechanism, and works for all the metrics. It doesn't work for the Query class though.

A further step, and this might be better as a separate issue/PR, might be to:

allow setting the max and default grades in the top-level pom.xml configuration;
allow overrides via the ParameterizedMetric mechanism, so users can set both "pessimistic" and "optimistic" defaults;
handle metric naming when both pessimistic and optimistic defaults are set for the same metric (currently I think you'd get two ERR@10 metrics, though I haven't checked).

How does this sound? I'm happy to put together PRs for these if they seem like sensible suggestions.

renekrie · 2020-03-14T00:41:36Z

(I thought I had left a comment earlier but it seems lost - sorry for re-posting if my old comment re-appears.)

I was wondering if we could use this rework of grades to allow for decimal numbers and grades between 0 and 1 as well. This is often what we end up with when we derive judgments/grades from tracking. Mapping them to integers/buckets would add a second, somehow arbitrary model.

mattflax · 2020-03-16T09:33:13Z

This should be possible - most of the calculations use BigDecimal (floating point) objects already, although both the DCG and ReciprocalRank calculations convert the grade to an integer before using it. Would it be useful to specify the scale value as well? I can see both 2 and 8 being used, depending on the metric, though I don't know why those choices were made.

- Make fairgrade configurable at construction time; - Use floating point grade values in gain(); - Add default grade values to Metric.

- Make maxgrade and fairgrade configurable at construction time; - Use floating point grade values in gain function.

… point values for grading.

…de to be set via Maven pom.xml: - Add singleton factory for instantiating MetricClassManager instances; - Allow access to default grade values via factory class. - Modify Maven plugins to use factory.

renekrie · 2020-03-17T09:18:28Z

This should be possible - most of the calculations use BigDecimal (floating point) objects already, although both the DCG and ReciprocalRank calculations convert the grade to an integer before using it. Would it be useful to specify the scale value as well? I can see both 2 and 8 being used, depending on the metric, though I don't know why those choices were made.

I think scale wouldn't harm as an optional parameter. It's probably a difficult choice to find the right value - I guess hardly anyone has an opinion about what would be right - but at least you provide the flexibility.

Yet another thought on floats: depending on the implementation of the metrics we should be careful with judgment values between 0 and 1. If we take the logarithm of this we might end up with negative values but I guess none of the metrics would do that?!

mattflax · 2020-03-17T10:01:14Z

@renekrie I've skipped making scale configurable for now - the PR was getting big, and it feels like something that could be put in a separate issue if it becomes a requirement.

…RR, NDCG, RR to add consistency with pom.xml parameters.

… on start-up.

#109: Make maximum and default grades configurable.

epugh · 2020-03-25T13:25:44Z

Is this cloasable @agazzarini with the merge?

agazzarini · 2020-03-25T19:23:01Z

Yep Eric, sure. Sorry for the confusion, crazy days :)

On Wed, 25 Mar 2020 at 14:26, Eric Pugh ***@***.***> wrote: Is this cloasable @agazzarini <https://github.com/agazzarini> with the merge? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#109 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZYBYGOL46TH4AXAOTPF5DRJIA6PANCNFSM4KT6VROA> .

-- Andrea Gazzarini *Search Consultant, R&D Software Engineer* www.sease.io email: [email protected] cell: +39 349 513 86 25

agazzarini self-assigned this Feb 12, 2020

agazzarini added the enhancement New feature or request label Feb 12, 2020

mattflax pushed a commit to mattflax/rated-ranking-evaluator that referenced this issue Mar 16, 2020

SeaseLtd#109: Updates to ERR implementation:

8fcd697

- Make fairgrade configurable at construction time; - Use floating point grade values in gain(); - Add default grade values to Metric.

mattflax pushed a commit to mattflax/rated-ranking-evaluator that referenced this issue Mar 16, 2020

SeaseLtd#109: Updates to NDCG@K implementation:

9ec19ad

- Make maxgrade and fairgrade configurable at construction time; - Use floating point grade values in gain function.

mattflax pushed a commit to mattflax/rated-ranking-evaluator that referenced this issue Mar 16, 2020

SeaseLtd#109: Add parameters to ERR javadoc.

7a4636c

mattflax pushed a commit to mattflax/rated-ranking-evaluator that referenced this issue Mar 16, 2020

SeaseLtd#109: Add k, grade parameters to RR constructor; use floating…

ef1bc11

… point values for grading.

mattflax pushed a commit to mattflax/rated-ranking-evaluator that referenced this issue Mar 16, 2020

SeaseLtd#109: Javadoc, tidying, reducing code repetition.

3a0d2c6

mattflax pushed a commit to mattflax/rated-ranking-evaluator that referenced this issue Mar 16, 2020

SeaseLtd#109: Add configurable name parameters to ERR, NDCG, RR.

bcdfa48

mattflax pushed a commit to mattflax/rated-ranking-evaluator that referenced this issue Mar 16, 2020

SeaseLtd#109: Fix field names in Maven evaluation plugins.

0039da9

mattflax pushed a commit to mattflax/rated-ranking-evaluator that referenced this issue Mar 17, 2020

SeaseLtd#109: Minor formatting fixes.

11f6308

mattflax mentioned this issue Mar 17, 2020

#109: Make maximum and default grades configurable. #111

Merged

mattflax pushed a commit to mattflax/rated-ranking-evaluator that referenced this issue Mar 17, 2020

SeaseLtd#109: Standardise parameter names to camelCase versions for E…

11f295b

…RR, NDCG, RR to add consistency with pom.xml parameters.

mattflax pushed a commit to mattflax/rated-ranking-evaluator that referenced this issue Mar 22, 2020

SeaseLtd#109: Instantiate MetricClassConfigurationManager immediately…

5557459

… on start-up.

agazzarini added a commit that referenced this issue Mar 25, 2020

Merge pull request #111 from mattflax/GH109

b2ab270

#109: Make maximum and default grades configurable.

agazzarini closed this as completed Mar 25, 2020

agazzarini added this to the 1.1 milestone Apr 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support N-Level judgements (instead of 3) #109

Support N-Level judgements (instead of 3) #109

anthonygroves commented Feb 12, 2020

agazzarini commented Feb 12, 2020

epugh commented Mar 10, 2020

agazzarini commented Mar 10, 2020

mattflax commented Mar 12, 2020

renekrie commented Mar 14, 2020

mattflax commented Mar 16, 2020

renekrie commented Mar 17, 2020

mattflax commented Mar 17, 2020

epugh commented Mar 25, 2020

agazzarini commented Mar 25, 2020 via email

Support N-Level judgements (instead of 3) #109

Support N-Level judgements (instead of 3) #109

Comments

anthonygroves commented Feb 12, 2020

agazzarini commented Feb 12, 2020

epugh commented Mar 10, 2020

agazzarini commented Mar 10, 2020

mattflax commented Mar 12, 2020

renekrie commented Mar 14, 2020

mattflax commented Mar 16, 2020

renekrie commented Mar 17, 2020

mattflax commented Mar 17, 2020

epugh commented Mar 25, 2020

agazzarini commented Mar 25, 2020 via email