Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recall score for the kept tokens in SARI #6

Open
ekQ opened this issue Oct 30, 2018 · 2 comments
Open

Recall score for the kept tokens in SARI #6

ekQ opened this issue Oct 30, 2018 · 2 comments

Comments

@ekQ
Copy link

ekQ commented Oct 30, 2018

Thank you for making this code available! I was trying to understand how the different components of the SARI score are computed, and I wonder if I've misunderstood something or if there's an inconsistency between the code and the paper. Consider the following example.

Input: "a b"
Output: "b"
Ref-1: "a b"
Ref-2: "a"

Now if I manually compute the recall of kept tokens using Eq. 5 from the paper, I get

    r_{keep}(1) = [min(0, 1) + min(1, 1/2)] / [1 + 1/2] = 1/3,

where the first terms of the numerator and denominator correspond to "a" and the second terms to "b". However, the GitHub implementation gives me

    r_{keep}(1) = 1/2.

The reason is that in the code the terms of the numerator are divided individually by the corresponding denominator terms on line 58, instead of dividing the sum of the numerator terms by the sum of the denominator terms as done in Eq. 5 in the paper.

Replacing line 58 by:

    keeptmpscore2 += keepgramcountergood_rep[keepgram]

and line 65 by:

    keepscore_recall = keeptmpscore2 / sum(keepgramcounterall_rep.values())

seems to fix this and yield p_{keep}(1) = 1/3 as I would expect it to yield.

Have I missed something? Thanks in advance!

@cocoxu
Copy link
Owner

cocoxu commented Oct 31, 2018

We inflated the counts by number of references in order to achieve the weighting of R' more conveniently. We weighted each ngram in calculating the numerator keeptmpscore1 and keeptmpscore2, then divided by len(keepgramcounterall_rep) instead of weighted sum (which will weigh each ngram differently). In this way, as the current code, we treat each ngram equally in the weighting for calculating the denominator -- we found this more robust and worked well in practice.

@ekQ
Copy link
Author

ekQ commented Nov 1, 2018

Thanks for the quick reply! I understand the idea of inflating the counts, but it's not clear to me why, on line 58

keeptmpscore2 += keepgramcountergood_rep[keepgram] / keepgramcounterall_rep[keepgram]

the added terms are divided by keepgramcounterall_rep[keepgram]. Since the adjusted R' scores are incorporated both into keepgramcountergood_rep and into keepgramcounterall_rep, it seems that the division cancels out their effect, and in the end, only 1s get added to the numerator keeptmpscore2. In other words, it seems that r_{keep}(n) is effectively computed using R instead of R'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants