`Llama` family, fix `use_cache=False` generation #30380

ArthurZucker · 2024-04-22T07:55:07Z

What does this PR do?

HuggingFaceDocBuilderDev · 2024-04-22T08:16:45Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante

LGTM, thank you for fixing! 💛

Suggestion: add a fast llama + generate + no cache test

ArthurZucker · 2024-04-22T12:00:20Z

Yeah, but it feels like this should be part of the generation mixin tests (and AFAIR it is?)

* nit to make sure cache positions are not sliced * fix other models * nit * style

ArthurZucker added 4 commits April 22, 2024 03:52

nit to make sure cache positions are not sliced

6748f64

fix other models

2d55339

nit

ea0e220

style

deab6cf

ArthurZucker requested a review from gante April 22, 2024 08:54

gante approved these changes Apr 22, 2024

View reviewed changes

ArthurZucker merged commit 2d92db8 into main Apr 22, 2024
17 of 19 checks passed

ArthurZucker deleted the fix-llama-no-cache branch April 22, 2024 12:42

zafstojano pushed a commit to zafstojano/transformers that referenced this pull request Apr 22, 2024

Llama family, fix use_cache=False generation (huggingface#30380)

afaab3f

* nit to make sure cache positions are not sliced * fix other models * nit * style

ArthurZucker mentioned this pull request Apr 23, 2024

Static cache is locked after torch.compile with model.generate #30351

Closed

4 tasks

This was referenced Apr 23, 2024

[BUG] Regression in quantized inference when paired with Transformers >= 4.39.0 AutoGPTQ/AutoGPTQ#614

Closed

[BUG/FEATURE] Fix Sym=False, new checkpoint_format = gptq_v2 AutoGPTQ/AutoGPTQ#640

Open

davidgxue mentioned this pull request Apr 30, 2024

Llama-3 8B Instruct quantized to 8 Bit spits out gibberish in transformers model.generate() but works fine in vLLM? AutoGPTQ/AutoGPTQ#657

Closed

michaelfeil mentioned this pull request May 2, 2024

Bug: Evals might be broken in pinned HF transformers version cache=False jzhang38/EasyContext#26

Closed

itazap pushed a commit that referenced this pull request May 14, 2024

Llama family, fix use_cache=False generation (#30380)

4aa72ca

* nit to make sure cache positions are not sliced * fix other models * nit * style

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Llama` family, fix `use_cache=False` generation #30380

`Llama` family, fix `use_cache=False` generation #30380

ArthurZucker commented Apr 22, 2024

HuggingFaceDocBuilderDev commented Apr 22, 2024

gante left a comment

ArthurZucker commented Apr 22, 2024

Llama family, fix use_cache=False generation #30380

Llama family, fix use_cache=False generation #30380

Conversation

ArthurZucker commented Apr 22, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Apr 22, 2024

gante left a comment

Choose a reason for hiding this comment

ArthurZucker commented Apr 22, 2024

`Llama` family, fix `use_cache=False` generation #30380

`Llama` family, fix `use_cache=False` generation #30380