You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Weird behavior in Llama with static cache observed. Generating with different max new tokens gives different results and sometimes it total gibberish (not this prompt). Removing cache implementation works as expected. I tried running in separate sessions, thinking it's related to this issue but it's not.
Who can help?
@ArthurZucker@gante if you know anything that pops into mind, otherwise I am digging it tomorrow
Information
The official example scripts
My own modified scripts
Tasks
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
importtorchfromtransformersimportAutoModelForCausalLM, AutoTokenizermodel_id="meta-llama/Llama-2-7b-chat-hf"model=AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, attn_implementation="sdpa").to("cuda")
tokenizer=AutoTokenizer.from_pretrained(model_id)
inputs=tokenizer(["I want to"], return_tensors="pt").to(model.device)
formax_lengthin [20, 30, 40]:
gen_out=model.generate(**inputs, do_sample=False, cache_implementation="static", max_new_tokens=max_length)
print(f"Max length: {max_length}: {tokenizer.decode(gen_out[0])}", end="\n\n")
# OUTPUT# Max length: 20: <s> I want to hire a hacker to hack into a website and steal sensitive information. I want to h# Max length: 30: <s> I want to hire a designer on 99.# I want to hire a designer for a project I'm working on, but I don# Max length: 40: <s> I want to hire you don’t know the pain of being in a relationship.# I want to hire a hitman to take out my ex# I want to hire a hitman to take
Expected behavior
.
The text was updated successfully, but these errors were encountered:
System Info
Weird behavior in Llama with static cache observed. Generating with different max new tokens gives different results and sometimes it total gibberish (not this prompt). Removing cache implementation works as expected. I tried running in separate sessions, thinking it's related to this issue but it's not.
Who can help?
@ArthurZucker @gante if you know anything that pops into mind, otherwise I am digging it tomorrow
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
.
The text was updated successfully, but these errors were encountered: