Generate: remove deprecated public decoding functions and streamline logic 🧼 #29956

gante · 2024-03-29T14:21:39Z

What does this PR do?

🧼 🧼 🧼

Calling the internal decoding functions as part of our public API was scheduled for removal in v4.41 (the next release). Its motivation was flexibility and conciseness: having multiple public interfaces for the same functionality forced us to add repeated logic in many places, increasing every time we added a new decoding method.

Due to this removal from the public API, a few things were changed/removed as a logical consequence:

No more documentation/examples in the internal decoding functions. We have a page with examples to call the internal decoding methods from generate;
The arguments to the decoding functions are no longer optional -- the decoding functions are exclusively called from generate. As such, we can remove a lot of boilerplate (x = x if x is not None else self.generation_config.x).
No more tests regarding the internal decoding methods -- already removed in this PR

Tests ran locally:

generate doctests (pytest --doctest-modules src/transformers/generation -vv)
generate integration tests (RUN_SLOW=1 py.test tests/generation/test_utils.py -vv)
cache integration tests (RUN_SLOW=1 py.test tests/test_cache_utils.py -vv) -- same failures as in main
llama slow tests (RUN_SLOW=1 py.test tests/models/llama/test_modeling_llama.py -vv)
whisper slow tests (RUN_SLOW=1 py.test tests/models/whisper/test_modeling_whisper.py -vv) -- same failures as in main

HuggingFaceDocBuilderDev · 2024-03-29T15:18:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante · 2024-03-29T18:59:44Z

Tests on all models are passing 🙌 (the failing pipeline test seems unrelated, and passing locally on my end)

zucchini-nlp

Yaaay, thanks for clean-up! Looks so much nicer

gante · 2024-04-23T09:48:38Z

src/transformers/generation/configuration_utils.py

@@ -65,25 +65,16 @@ class GenerationConfig(PushToHubMixin):
    Class that holds a configuration for a generation task. A `generate` call supports the following generation methods
    for text-decoder, text-to-text, speech-to-text, and vision-to-text models:

-        - *greedy decoding* by calling [`~generation.GenerationMixin._greedy_search`] if `num_beams=1` and


functions not public -> no docs -> remove link to the docs

gante · 2024-04-23T09:49:41Z

src/transformers/generation/utils.py

+        encoder_kwargs["output_attentions"] = generation_config.output_attentions
+        encoder_kwargs["output_hidden_states"] = generation_config.output_hidden_states


(see comment on L1434)

gante · 2024-04-23T09:51:48Z

src/transformers/generation/utils.py

-        model_kwargs["output_attentions"] = generation_config.output_attentions
-        model_kwargs["output_hidden_states"] = generation_config.output_hidden_states


Instead of pulling information from generation_config to pass around through model_kwargs, let's use generation_config directly.

A single object to hold all generation parameterization.

gante · 2024-04-23T09:54:04Z

src/transformers/generation/utils.py

+                output_attentions=generation_config.output_attentions,
+                output_hidden_states=generation_config.output_hidden_states,


These were being passed through model_kwargs before

gante · 2024-04-23T09:55:01Z

src/transformers/generation/utils.py

-        top_k: Optional[int] = 1,
-        penalty_alpha: Optional[float] = 0,
-        logits_processor: Optional[LogitsProcessorList] = None,
-        logits_warper: Optional[LogitsProcessorList] = None,


This one was not needed -- contrastive search does not sample

gante · 2024-04-23T09:57:00Z

src/transformers/generation/utils.py

@@ -1674,6 +1680,9 @@ def generate(
                logits_processor=prepared_logits_processor,
                stopping_criteria=prepared_stopping_criteria,
                pad_token_id=generation_config.pad_token_id,
+                eos_token_id=generation_config.eos_token_id,


On most decoding methods eos_token_ids doesn't need to be passed -- it was used when the decoding method was called directly and stopping_criteria was not passed.

However, beam methods still need it.

gante · 2024-04-23T09:58:41Z

src/transformers/generation/utils.py

@@ -1945,69 +1940,9 @@ def _contrastive_search(
            [`~generation.GenerateDecoderOnlyOutput`] if `model.config.is_encoder_decoder=False` and
            `return_dict_in_generate=True` or a [`~generation.GenerateEncoderDecoderOutput`] if
            `model.config.is_encoder_decoder=True`.
-
-        Examples:


function not public -> let's remove the example (preparing its inputs will be more challenging now, as we no longer have API guarantees)

gante · 2024-04-23T10:14:53Z

src/transformers/generation/utils.py

-            if eos_token_id is not None:
-                if pad_token_id is None:
-                    raise ValueError("If `eos_token_id` is defined, make sure that `pad_token_id` is defined.")


In the main generate body, we set pad_token_id to eos_token_id in this situation -- this exception will never be reached

gante · 2024-04-23T10:37:29Z

ping @ArthurZucker -- ready for review :)

ArthurZucker

Late review but very nice cleanup sir! 🤗

ArthurZucker · 2024-04-29T16:48:46Z

src/transformers/generation/utils.py

+        pad_token_id: Optional[int],
+        output_attentions: bool,
+        output_hidden_states: bool,
+        output_scores: bool,
+        output_logits: bool,
+        return_dict_in_generate: bool,


theoretically some of these can be taken from the config / the generation config if it inherits them. But nit

Yes! That's a good idea

gante · 2024-05-01T16:38:32Z

Reran slow tests locally, all seems good 👍

…logic 🧼 (#29956)

gante changed the title ~~Generate: remove deprecated public decoding functions and streamline logic~~ Generate: remove deprecated public decoding functions and streamline logic 🧼 Mar 29, 2024

gante mentioned this pull request Mar 29, 2024

Generate: consistently handle special tokens as tensors #29788

Closed

7 tasks

gante marked this pull request as ready for review March 29, 2024 17:01

gante requested review from ArthurZucker and zucchini-nlp March 29, 2024 17:01

zucchini-nlp approved these changes Apr 2, 2024

View reviewed changes

gante force-pushed the v_4_41_decoding_functions branch from e3c994f to db8cc74 Compare April 23, 2024 09:44

gante commented Apr 23, 2024

View reviewed changes

ArthurZucker approved these changes Apr 30, 2024

View reviewed changes

gante added 11 commits April 30, 2024 18:22

remove a few things

da30ad2

contrastive_search: no more optional inputs

1850939

remove duplicated model kwargs

b45e708

output optional tensors correctly in encoder-decoder

0500cca

non-beam methods without defaults

d015ed5

beam search

57929ac

clean up remaining methods

a70f1bb

assisted decoding signature

4b65020

assisted decoding interface

8347532

[test_all]

b4c6c4e

fix rag

8a55d64

gante added 3 commits April 30, 2024 18:22

fix musicgen?

921ec6a

[test_all] fix models with custom generate

8af8509

[test_all] pass generation_config to decoding methods

a8b6e58

gante force-pushed the v_4_41_decoding_functions branch from db8cc74 to a8b6e58 Compare May 1, 2024 13:42

Merge branch 'main' into v_4_41_decoding_functions

8856430

gante merged commit d57ffb4 into huggingface:main May 1, 2024
23 checks passed

gante deleted the v_4_41_decoding_functions branch May 1, 2024 16:38

sanchit-gandhi mentioned this pull request May 9, 2024

[generation] use private greedy/sampling methods huggingface/parler-tts#43

Merged

itazap pushed a commit that referenced this pull request May 14, 2024

Generate: remove deprecated public decoding functions and streamline …

788f0eb

…logic 🧼 (#29956)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate: remove deprecated public decoding functions and streamline logic 🧼 #29956

Generate: remove deprecated public decoding functions and streamline logic 🧼 #29956

gante commented Mar 29, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 29, 2024

gante commented Mar 29, 2024

zucchini-nlp left a comment

gante Apr 23, 2024

gante Apr 23, 2024

gante Apr 23, 2024

gante Apr 23, 2024

gante Apr 23, 2024

gante Apr 23, 2024

gante Apr 23, 2024

gante Apr 23, 2024

gante commented Apr 23, 2024

ArthurZucker left a comment

ArthurZucker Apr 29, 2024

gante Apr 30, 2024

gante commented May 1, 2024

		encoder_kwargs["output_attentions"] = generation_config.output_attentions
		encoder_kwargs["output_hidden_states"] = generation_config.output_hidden_states

		model_kwargs["output_attentions"] = generation_config.output_attentions
		model_kwargs["output_hidden_states"] = generation_config.output_hidden_states

		output_attentions=generation_config.output_attentions,
		output_hidden_states=generation_config.output_hidden_states,

Generate: remove deprecated public decoding functions and streamline logic 🧼 #29956

Generate: remove deprecated public decoding functions and streamline logic 🧼 #29956

Conversation

gante commented Mar 29, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Mar 29, 2024

gante commented Mar 29, 2024

zucchini-nlp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gante commented Apr 23, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gante commented May 1, 2024

gante commented Mar 29, 2024 •

edited

Loading