Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with corrupted output on long generation for GPT #2344

Closed

Conversation

andrewchernyh
Copy link
Contributor

Fix issue described in #2300

Only updating MAX_OUT_TOKES doesn't helps and main issue is in temp_buf that was allocated just after output tensor

Then in attention_unfused
T* workspace = (T*)output + bsz * seq_len * heads * k; where output is temp_buf from ds_softmax_context
In the result temp_buf overlaps query_cont allocated in ds_softmax_context

I've put temp_buf just after kv_cache and it helps

Minimal steps to reproduce:

python3 benchmarks/inference/gpt-bench.py -m EleutherAI/gpt-neo-125M --kernel-inject --deepspeed --dtype=fp32 --max-tokens=1020 --trials 1

@ghost
Copy link

ghost commented Sep 22, 2022

CLA assistant check
All CLA requirements met.

@andrewchernyh
Copy link
Contributor Author

Not actual more, see #2300 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant