Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are Gigaword examples with summary length bigger than the article length considered when the final metrics are computed? #8

Open
Tomarchelone opened this issue Jan 10, 2024 · 0 comments

Comments

@Tomarchelone
Copy link

In Gigaword dataset there are some examples where the summary is longer than the source sequence. Sometimes the sourse is a single unk word. As I can see in dataclass.py, such examples are dropped from the pipeline completely.

Were the rouge scores reported in the paper computed without those examples? If yes, then it is incorrect to compare the resulting scores with the baselines. For example, as I can see, the rouge scores for Concept Pointer were taken directly from the paper, where they measured the performance on all test examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant