-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] GPT-J + init_inference + replace_with_kernel_inject returns copy error with multiple GPUs #1719
Comments
In addition, I was able to replicate the issue on a different box with Fedora and 8x a6000 GPUs. |
Hi @TiesdeKok, I will take a look at this. Thanks, |
Thanks a lot! |
Hi @TiesdeKok Thanks. |
Appreciate the quick turnaround here @RezaYazdaniAminabadi! The copy error is gone and the inference starts now, so that appears resolved. 🥳 However, I am running into another problem where everything works great with one GPU, however, with multiple GPUs, the inference will hang indefinitely. I can make a separate issue if you prefer, but let me describe what I am observing:
No errors are shown, it just pins the GPUs at 100% and nothing happens. I have tried this on two different machines and the behavior is the same. I noticed the same issue already yesterday without the kernel inject and letting it run for hours (on one prompt) clearly indicates that things are stuck. To dig into this further, I have also tried using the distilgpt2 model, the same issue pops up:
I am a little lost here, the code I am running is essentially the same as: Which I run with I tried looking for a verbose option to see if I could get better logging once things are on the GPUs, however, I could not find it. Any ideas on what might be happening here? 😕 |
This is a known issue that we have with the integration of DeepSpeed Inference and HF. This is happening since one GPU is finished with the generation while the other one is waiting to continue for the next token-generation. Would you mind setting the min_length and max_length to the same number and see if this issue is resolved? |
After reading your description it immediately hit me that the hanging issue is caused by a With that out of the way, I am now seeing weird behavior with the kernel inject:
I added a quick print statement right before that print(probs)
#tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:1')
#tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:2')
#tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:0')
#tensor([[nan, nan, nan, ..., nan, nan, nan]], device='cuda:3') The above issue and error also occur when settings Any thoughts on what might be the issue here? Thanks again for your help! Ps. my |
I did test this on the same versions as you mentioned. Just that I am using PyTorch1.9. The code snippet I am using is as follows:
|
That little code snippet was very helpful to debug what is happening here, my observations: I was using the float16 revision so I had to download the float32 version and I figured that might be it, but that didn't change anything. I got the same error as before when running the exact code you provided (I only fixed the deepspeed import): When turning off sampling I also saw the same weird behavior with the exclamation marks: However, given that it worked for you there had to be something about my setup that was causing it, so I started changing dials:
But then I tried |
Hi @TiesdeKok, |
Hi @TiesdeKok I am also facing the garbage output issue. Not sure if it is related to the issue you were having previously: #2113 |
Describe the bug
Using the
replace_with_kernel_inject
option ininit_inference
returns an error when using multiple GPUs (with a GPT-J model).To Reproduce
Steps to reproduce the behavior:
Expected behavior
No error.
ds_report output
Unavailable, not currently in the compute node.
Screenshots
System info (please complete the following information):
pytorch/pytorch:1.9.1-cuda11.1-cudnn8-devel
Launcher context
Deepspeed command line
Docker context
Base image is:
pytorch/pytorch:1.9.1-cuda11.1-cudnn8-devel
Additional context
replace_with_kernel_inject
is set to Falsereplace_with_kernel_inject = True
and running the script directly with a single GPU.https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/module_inject/replace_module.py#L74
The text was updated successfully, but these errors were encountered: