Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding top_k argument to text-classification pipeline. #17606

Merged
merged 4 commits into from
Jun 9, 2022

Conversation

Narsil
Copy link
Contributor

@Narsil Narsil commented Jun 8, 2022

What does this PR do?

A lot of users are wondering why the API does not return results sorted.
This PR enables transformers to do that and enables functionnality
to users without causing any regression.

The API will simply override the default argument to get sorted results.

  • Deprecate return_all_scores as top_k is more uniform with other
    pipelines, and a superset of what return_all_scores can do.
    BC is maintained though.
    return_all_scores=True -> top_k=None
    return_all_scores=False -> top_k=1

  • Using top_k will imply sorting the results, but using no argument
    will keep the results unsorted for backward compatibility.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil Narsil requested a review from LysandreJik June 8, 2022 13:32
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 8, 2022

The documentation is not available anymore as the PR was closed or merged.

- Deprecate `return_all_scores` as `top_k` is more uniform with other
  pipelines, and a superset of what `return_all_scores` can do.
  BC is maintained though.
  `return_all_scores=True` -> `top_k=None`
  `return_all_scores=False` -> `top_k=1`

- Using `top_k` will imply sorting the results, but using no argument
  will keep the results unsorted for backward compatibility.
@Narsil Narsil force-pushed the upgrade_text_classification branch from 132030b to fb69da1 Compare June 8, 2022 13:54
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, top_k is a nice addition! Thanks for working on that, @Narsil!

@Narsil Narsil merged commit 2351729 into huggingface:main Jun 9, 2022
@Narsil Narsil deleted the upgrade_text_classification branch June 9, 2022 16:33
elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022
…ce#17606)

* Adding `top_k` and `sort` arguments to `text-classification` pipeline.

- Deprecate `return_all_scores` as `top_k` is more uniform with other
  pipelines, and a superset of what `return_all_scores` can do.
  BC is maintained though.
  `return_all_scores=True` -> `top_k=None`
  `return_all_scores=False` -> `top_k=1`

- Using `top_k` will imply sorting the results, but using no argument
  will keep the results unsorted for backward compatibility.

* Remove `sort`.

* Fixing the test.

* Remove bad doc.
amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Jun 16, 2022
…ce#17606)

* Adding `top_k` and `sort` arguments to `text-classification` pipeline.

- Deprecate `return_all_scores` as `top_k` is more uniform with other
  pipelines, and a superset of what `return_all_scores` can do.
  BC is maintained though.
  `return_all_scores=True` -> `top_k=None`
  `return_all_scores=False` -> `top_k=1`

- Using `top_k` will imply sorting the results, but using no argument
  will keep the results unsorted for backward compatibility.

* Remove `sort`.

* Fixing the test.

* Remove bad doc.
Narsil added a commit to Narsil/transformers that referenced this pull request Jun 27, 2022
…e#17606

- The legacy test actually tested `return_all_scores=False` (the actual
  default) instead of `return_all_scores=True` (the actual weird case).

This commit adds the correct legacy test and fixes it.
Narsil added a commit to Narsil/transformers that referenced this pull request Jun 27, 2022
…e#17606

- The legacy test actually tested `return_all_scores=False` (the actual
  default) instead of `return_all_scores=True` (the actual weird case).

This commit adds the correct legacy test and fixes it.
Narsil added a commit to Narsil/transformers that referenced this pull request Jun 28, 2022
…e#17606

Fixing a regression with `return_all_scores` introduced in huggingface#17606

- The legacy test actually tested `return_all_scores=False` (the actual
  default) instead of `return_all_scores=True` (the actual weird case).

This commit adds the correct legacy test and fixes it.

Tmp legacy tests.

Actually fix the regression (also contains lists)

Less diffed code.
Narsil added a commit to Narsil/transformers that referenced this pull request Jun 28, 2022
…e#17606

Fixing a regression with `return_all_scores` introduced in huggingface#17606

- The legacy test actually tested `return_all_scores=False` (the actual
  default) instead of `return_all_scores=True` (the actual weird case).

This commit adds the correct legacy test and fixes it.

Tmp legacy tests.

Actually fix the regression (also contains lists)

Less diffed code.
Narsil added a commit to Narsil/transformers that referenced this pull request Jun 28, 2022
…e#17606

Fixing a regression with `return_all_scores` introduced in huggingface#17606

- The legacy test actually tested `return_all_scores=False` (the actual
  default) instead of `return_all_scores=True` (the actual weird case).

This commit adds the correct legacy test and fixes it.

Tmp legacy tests.

Actually fix the regression (also contains lists)

Less diffed code.
Narsil added a commit that referenced this pull request Jun 28, 2022
…7906)

Fixing a regression with `return_all_scores` introduced in #17606

- The legacy test actually tested `return_all_scores=False` (the actual
  default) instead of `return_all_scores=True` (the actual weird case).

This commit adds the correct legacy test and fixes it.

Tmp legacy tests.

Actually fix the regression (also contains lists)

Less diffed code.
younesbelkada pushed a commit to younesbelkada/transformers that referenced this pull request Jun 29, 2022
…e#17606 (huggingface#17906)

Fixing a regression with `return_all_scores` introduced in huggingface#17606

- The legacy test actually tested `return_all_scores=False` (the actual
  default) instead of `return_all_scores=True` (the actual weird case).

This commit adds the correct legacy test and fixes it.

Tmp legacy tests.

Actually fix the regression (also contains lists)

Less diffed code.
viclzhu pushed a commit to viclzhu/transformers that referenced this pull request Jul 18, 2022
…e#17606 (huggingface#17906)

Fixing a regression with `return_all_scores` introduced in huggingface#17606

- The legacy test actually tested `return_all_scores=False` (the actual
  default) instead of `return_all_scores=True` (the actual weird case).

This commit adds the correct legacy test and fixes it.

Tmp legacy tests.

Actually fix the regression (also contains lists)

Less diffed code.
@lucb
Copy link

lucb commented Sep 26, 2022

This PR should be cleaned up, it changed the behavior of the text classification pipeline and the documentation is not clear. Can we get more explanations about the way we should expect to modify the code to get the same behavior with the top_k arg or continue supporting the return_all_scores with the same output order?

Passing return_all_scores=True is not equivalent to top_k=1 for example, and setting top_k=n is sorting the results which is not the same order as before this change. Please provide better documentation and some way to aleviate the pain of upgrading the transformers library by minimizing the incompatible changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants