Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix jetmoe model #31279

Merged
merged 2 commits into from
Jun 7, 2024
Merged

Fix jetmoe model #31279

merged 2 commits into from
Jun 7, 2024

Conversation

Cyrilvallez
Copy link
Member

@Cyrilvallez Cyrilvallez commented Jun 6, 2024

What does this PR do?

Fixes #31266
cc @ArthurZucker @amyeroberts

I am sorry I did not notice that JetMoePreTrainedModel got _supports_cache_class = True support while working on #30536.

To avoid the same bug in the future, any model getting _supports_cache_class = True in the future should change the line:

if inputs_embeds is not None and past_key_values is None:
            model_inputs = {"inputs_embeds": inputs_embeds}

to

if inputs_embeds is not None and past_length == 0:
            model_inputs = {"inputs_embeds": inputs_embeds}

in prepare_inputs_for_generation() because checking past_key_values is None is no longer correct as empty but initialized DynamicCache can be passed.

Also to minimize code complexity, the if-else that checks if isinstance(past_key_values, Cache): (still in prepare_inputs_for_generation()) can be safely removed in favor of the case when this is True when _supports_cache_class = True as generate() will only use proper Cache classes in this case.
Some similar if-else could also be removed in upstream model architectures classes as well.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing, @Cyrilvallez!

Could you also remove the skips from the model tests which were added in #31266 as a patch?

I think the changes look OK, but let's have a second opinion from @ArthurZucker, as I'm not very familiar with the recent cache changes

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we merged this for other models as well so LGTM.

@ArthurZucker ArthurZucker merged commit 8bcf9c8 into huggingface:main Jun 7, 2024
20 checks passed
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jun 11, 2024
* Fix jetmoe model

* Remove skip-tests
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jun 14, 2024
* Fix jetmoe model

* Remove skip-tests
itazap pushed a commit that referenced this pull request Jun 17, 2024
* Fix jetmoe model

* Remove skip-tests
itazap pushed a commit that referenced this pull request Jun 17, 2024
* Fix jetmoe model

* Remove skip-tests
itazap pushed a commit that referenced this pull request Jun 17, 2024
* Fix jetmoe model

* Remove skip-tests
@Cyrilvallez Cyrilvallez deleted the fix-moe branch August 30, 2024 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants