Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper: remove redundant assisted generation tests #34814

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gante
Copy link
Member

@gante gante commented Nov 19, 2024

What does this PR do?

Part of making CI green in preparation for #34807 (test fetcher fetches generation tests)

  1. Reverts recent default max_length-related changes. The default max length is 20 and, due to recent changes, a different length was being returned in some cases. Whisper was one of the affected models, it was defaulting to a length of 21.
  2. Deletes test_assisted_decoding_encoder_decoder_shared_encoder (integration test) -- the point of this test was to test assisted generation with DistilWhisper-like structure, which used to have a custom input (assistant_encoder_outputs). a) That input is no longer present in Whisper b) the DistilWhisper structure didn't take off, so it's overkill to have an abstract test with dummy classes c) we have other DistilWhisper tests.
  3. Deletes test_model_kwarg_assisted_decoding_encoder_decoder (integration test) -- added when DistiWhisper was added. Meanwhile, assisted generation was modified to work with any encoder-decoder model, and we have mixin tests.

@gante gante marked this pull request as ready for review November 19, 2024 17:21
@gante gante changed the title Whisper: remove redundant assisted generation test Whisper: remove redundant assisted generation tests Nov 19, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the detailed explanation!
We need to keep the max length changes 🤗

Comment on lines -1464 to -1469
elif has_default_max_length: # by default let's always generate 20 new tokens
if generation_config.max_length == GenerationConfig().max_length:
generation_config.max_length = generation_config.max_length + input_ids_length
max_position_embeddings = getattr(self.config, "max_position_embeddings", None)
if max_position_embeddings is not None:
generation_config.max_length = min(generation_config.max_length, max_position_embeddings)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this should be kept as it really helps the pipeline / default usage!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants