Adding `top_k` argument to `text-classification` pipeline. #17606

Narsil · 2022-06-08T13:32:28Z

What does this PR do?

A lot of users are wondering why the API does not return results sorted.
This PR enables transformers to do that and enables functionnality
to users without causing any regression.

The API will simply override the default argument to get sorted results.

Deprecate return_all_scores as top_k is more uniform with other
pipelines, and a superset of what return_all_scores can do.
BC is maintained though.
return_all_scores=True -> top_k=None
return_all_scores=False -> top_k=1
Using top_k will imply sorting the results, but using no argument
will keep the results unsorted for backward compatibility.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2022-06-08T13:44:23Z

The documentation is not available anymore as the PR was closed or merged.

- Deprecate `return_all_scores` as `top_k` is more uniform with other pipelines, and a superset of what `return_all_scores` can do. BC is maintained though. `return_all_scores=True` -> `top_k=None` `return_all_scores=False` -> `top_k=1` - Using `top_k` will imply sorting the results, but using no argument will keep the results unsorted for backward compatibility.

LysandreJik

Cool, top_k is a nice addition! Thanks for working on that, @Narsil!

…ce#17606) * Adding `top_k` and `sort` arguments to `text-classification` pipeline. - Deprecate `return_all_scores` as `top_k` is more uniform with other pipelines, and a superset of what `return_all_scores` can do. BC is maintained though. `return_all_scores=True` -> `top_k=None` `return_all_scores=False` -> `top_k=1` - Using `top_k` will imply sorting the results, but using no argument will keep the results unsorted for backward compatibility. * Remove `sort`. * Fixing the test. * Remove bad doc.

…e#17606 - The legacy test actually tested `return_all_scores=False` (the actual default) instead of `return_all_scores=True` (the actual weird case). This commit adds the correct legacy test and fixes it.

…e#17606 Fixing a regression with `return_all_scores` introduced in huggingface#17606 - The legacy test actually tested `return_all_scores=False` (the actual default) instead of `return_all_scores=True` (the actual weird case). This commit adds the correct legacy test and fixes it. Tmp legacy tests. Actually fix the regression (also contains lists) Less diffed code.

…7906) Fixing a regression with `return_all_scores` introduced in #17606 - The legacy test actually tested `return_all_scores=False` (the actual default) instead of `return_all_scores=True` (the actual weird case). This commit adds the correct legacy test and fixes it. Tmp legacy tests. Actually fix the regression (also contains lists) Less diffed code.

…e#17606 (huggingface#17906) Fixing a regression with `return_all_scores` introduced in huggingface#17606 - The legacy test actually tested `return_all_scores=False` (the actual default) instead of `return_all_scores=True` (the actual weird case). This commit adds the correct legacy test and fixes it. Tmp legacy tests. Actually fix the regression (also contains lists) Less diffed code.

lucb · 2022-09-26T14:49:47Z

This PR should be cleaned up, it changed the behavior of the text classification pipeline and the documentation is not clear. Can we get more explanations about the way we should expect to modify the code to get the same behavior with the top_k arg or continue supporting the return_all_scores with the same output order?

Passing return_all_scores=True is not equivalent to top_k=1 for example, and setting top_k=n is sorting the results which is not the same order as before this change. Please provide better documentation and some way to aleviate the pain of upgrading the transformers library by minimizing the incompatible changes.

Narsil requested a review from LysandreJik June 8, 2022 13:32

Narsil added 3 commits June 8, 2022 15:54

Remove sort.

96f3df4

Fixing the test.

fb69da1

Narsil force-pushed the upgrade_text_classification branch from 132030b to fb69da1 Compare June 8, 2022 13:54

Remove bad doc.

62031d5

LysandreJik approved these changes Jun 9, 2022

View reviewed changes

Narsil merged commit 2351729 into huggingface:main Jun 9, 2022

Narsil deleted the upgrade_text_classification branch June 9, 2022 16:33

thigm85 mentioned this pull request Jun 20, 2022

fix integration test based on transformer library update vespa-engine/pyvespa#351

Merged

Narsil mentioned this pull request Jun 27, 2022

Fixing a regression with return_all_scores introduced in #17606 #17906

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding `top_k` argument to `text-classification` pipeline. #17606

Adding `top_k` argument to `text-classification` pipeline. #17606

Narsil commented Jun 8, 2022

HuggingFaceDocBuilderDev commented Jun 8, 2022 •

edited

Loading

LysandreJik left a comment

lucb commented Sep 26, 2022

Adding top_k argument to text-classification pipeline. #17606

Adding top_k argument to text-classification pipeline. #17606

Conversation

Narsil commented Jun 8, 2022

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Jun 8, 2022 • edited Loading

LysandreJik left a comment

Choose a reason for hiding this comment

lucb commented Sep 26, 2022

Adding `top_k` argument to `text-classification` pipeline. #17606

Adding `top_k` argument to `text-classification` pipeline. #17606

HuggingFaceDocBuilderDev commented Jun 8, 2022 •

edited

Loading