-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Illegal memory access CUDA error when using long sequences #2062
Comments
@tomeras91 I can confirm that I'm able to reproduce this error. I don't think it has anything to do with |
Hi @tomeras91 Thanks for reporting this issue. I will look into this. |
Below is a possibly related bug. I added some sample code to reproduce this error for a Describe the bug After initialising a GPT2 model from Huggingface with DeepSpeed, I can run inference on short sequences. But when using long sequences with e.g. 700 tokens, I get multiple warnings Important: I tested old versions and found that I do not encounter that problem for deepspeed versions To Reproduce
import os
import deepspeed
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
model = AutoModelForCausalLM.from_pretrained("gpt2").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = deepspeed.init_inference(model, replace_method='auto', replace_with_kernel_inject=True)
short_input = tokenizer("Hello, my dog is cute." * 1, return_tensors="pt").to("cuda")
long_input = tokenizer("Hello, my dog is cute." * 100, return_tensors="pt").to("cuda")
outputs = model(**short_input) # this works fine
outputs = model(**long_input) # this throws below error
Expected behavior ds_report output
System info (please complete the following information):
Launcher context Possibly related issues: |
I also have encountered this error. Trying small inputs such as what the tutorial uses "DeepSpeed is" leads to normal results, but using significantly longer input leads to an illegal memory error. I would try version 0.6.6 or earlier as suggested by @trianxy but I want to use long sequences with GPTJ and GPT Neo 2.7B and those had issues up until recently as can be seen in #2233 . My build is the same as the one in that issue, just with DeepSpeed built from source shortly after the PR that fixed the issue. |
FYI @mallorbc , @tomeras91 , @RezaYazdaniAminabadi : My related issue which I detailed above is fixed in this PR. More precisely, my issue does not appear when I install the commit 4abd455 Thanks for that fix! |
If I recall, I also tried building from that PR and had issues with poor outputs for GPT Neo and GPTJ. I believe one of the branches I built fixed the memory error but still gave garbage output for long inputs. Perhaps this is the one. Perhaps I am remembering wrong though, I will try this again later and see if it fixed anything, but again I think I tried this already. Thanks! |
Alright @mallorbc - let me know if you need any support with testing. It's true that I have seen inconsistent behavior when trying different GPT architectures with different inputs, so it may be that not all cases have been fixed by mentioned PR. I did not run a lot of test cases with different architectures and inputs. |
Hi everyone, Thank you for your help! |
Hi @trianxy! |
Describe the bug
Running a forward pass on a
DeepSpeedTransformerInference
layer, with a sequence length of ~1000 tokens, results in an illegal memory access CUDA error.To Reproduce
Here is a minimal reproducible example that shows the bug:
Running the code resulted with the following exception
Expected behavior
I was expecting to get a correct output, without the excpetion.
ds_report output
System info (please complete the following information):
Launcher context
Launching directly using Python interpreter.
Additional context
Maybe the bug is related to line 20 in
csrc/transformer/inference/includes/custom_cuda_layers.h
? It reads:The text was updated successfully, but these errors were encountered: