Recall score for the kept tokens in SARI #6

ekQ · 2018-10-30T16:04:17Z

Thank you for making this code available! I was trying to understand how the different components of the SARI score are computed, and I wonder if I've misunderstood something or if there's an inconsistency between the code and the paper. Consider the following example.

Input: "a b"
Output: "b"
Ref-1: "a b"
Ref-2: "a"

Now if I manually compute the recall of kept tokens using Eq. 5 from the paper, I get

    r_{keep}(1) = [min(0, 1) + min(1, 1/2)] / [1 + 1/2] = 1/3,

where the first terms of the numerator and denominator correspond to "a" and the second terms to "b". However, the GitHub implementation gives me

    r_{keep}(1) = 1/2.

The reason is that in the code the terms of the numerator are divided individually by the corresponding denominator terms on line 58, instead of dividing the sum of the numerator terms by the sum of the denominator terms as done in Eq. 5 in the paper.

Replacing line 58 by:

    keeptmpscore2 += keepgramcountergood_rep[keepgram]

and line 65 by:

    keepscore_recall = keeptmpscore2 / sum(keepgramcounterall_rep.values())

seems to fix this and yield p_{keep}(1) = 1/3 as I would expect it to yield.

Have I missed something? Thanks in advance!

The text was updated successfully, but these errors were encountered:

cocoxu · 2018-10-31T21:36:40Z

We inflated the counts by number of references in order to achieve the weighting of R' more conveniently. We weighted each ngram in calculating the numerator keeptmpscore1 and keeptmpscore2, then divided by len(keepgramcounterall_rep) instead of weighted sum (which will weigh each ngram differently). In this way, as the current code, we treat each ngram equally in the weighting for calculating the denominator -- we found this more robust and worked well in practice.

ekQ · 2018-11-01T11:20:01Z

Thanks for the quick reply! I understand the idea of inflating the counts, but it's not clear to me why, on line 58

keeptmpscore2 += keepgramcountergood_rep[keepgram] / keepgramcounterall_rep[keepgram]

the added terms are divided by keepgramcounterall_rep[keepgram]. Since the adjusted R' scores are incorporated both into keepgramcountergood_rep and into keepgramcounterall_rep, it seems that the division cancels out their effect, and in the end, only 1s get added to the numerator keeptmpscore2. In other words, it seems that r_{keep}(n) is effectively computed using R instead of R'.

ddhruvkr mentioned this issue Apr 1, 2023

SARI score gives different results than the paper huggingface/evaluate#376

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recall score for the kept tokens in SARI #6

Recall score for the kept tokens in SARI #6

ekQ commented Oct 30, 2018 •

edited

Loading

cocoxu commented Oct 31, 2018

ekQ commented Nov 1, 2018

Recall score for the kept tokens in SARI #6

Recall score for the kept tokens in SARI #6

Comments

ekQ commented Oct 30, 2018 • edited Loading

cocoxu commented Oct 31, 2018

ekQ commented Nov 1, 2018

ekQ commented Oct 30, 2018 •

edited

Loading