[BUG][0.6.7] garbage output for multi-gpu with tutorial #2113

lanking520 · 2022-07-19T23:59:50Z

Describe the bug
When running GPU = 2 started to see garbage output generated.

[{'generated_text': 'DeepSpeed is����極��極\\\\\\\\\\ \n\nの (  (  "\n090 nodot\x0c �\n �$, "\xa0\n\n \n\n \\ �\n �\n\n � �\n �osa\n\n � oldaran � � �aran======\\'}

To Reproduce
I am running with 2 GPU instance with V100, also reproducible using A100.

Just follow this example: https://www.deepspeed.ai/tutorials/inference-tutorial/

# Filename: gpt-neo-2.7b-generation.py
import os
import deepspeed
import torch
from transformers import pipeline

local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-2.7B',
                     device=local_rank)



generator.model = deepspeed.init_inference(generator.model,
                                           mp_size=world_size,
                                           dtype=torch.float,
                                           replace_method='auto',
					   replace_with_kernel_inject=True)

string = generator("DeepSpeed is", do_sample=True, min_length=50)
if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
    print(string)

deepspeed --num_gpus 2 gpt-neo-2.7b-generation.py

Expected behavior

Should be just normal?

ds_report output

# ds_report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
async_io ............... [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.8/dist-packages/torch']
torch version .................... 1.11.0+cu113
torch cuda version ............... 11.3
torch hip version ................ None
nvcc version ..................... 11.3
deepspeed install path ........... ['/usr/local/lib/python3.8/dist-packages/deepspeed']
deepspeed info ................... 0.6.7, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.11, cuda 11.3

Screenshots

f': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False}
DeepSpeed Transformer Inference config is  {'layer_id': 29, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'scale_attention': True, 'triangular_masking': True, 'local_attention': True, 'window_size': 256, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False}
DeepSpeed Transformer Inference config is  {'layer_id': 30, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 256, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False}
DeepSpeed Transformer Inference config is  {'layer_id': 30, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 256, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False}
DeepSpeed Transformer Inference config is  {'layer_id': 31, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'scale_attention': True, 'triangular_masking': True, 'local_attention': True, 'window_size': 256, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False}
DeepSpeed Transformer Inference config is  {'layer_id': 31, 'hidden_size': 2560, 'intermediate_size': 10240, 'heads': 20, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 2, 'q_int8': False, 'scale_attention': True, 'triangular_masking': True, 'local_attention': True, 'window_size': 256, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False}
[2022-07-19 23:58:00,131] [INFO] [engine.py:144:__init__] Place model to device: 1
[2022-07-19 23:58:00,153] [INFO] [engine.py:144:__init__] Place model to device: 0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
[{'generated_text': 'DeepSpeed is����極��極\\\\\\\\\\ \n\nの (  (  "\n090 nodot\x0c �\n �$, "\xa0\n\n \n\n \\ �\n �\n\n � �\n �osa\n\n � oldaran � � �aran======\\'}]
[2022-07-19 23:58:04,674] [INFO] [launch.py:210:main] Process 811 exits successfully.
[2022-07-19 23:58:05,675] [INFO] [launch.py:210:main] Process 810 exits successfully.

System info (please complete the following information):

OS:Ubuntu 20.04
GPU count and types 2 V100
Python version 3.8

The text was updated successfully, but these errors were encountered:

lanking520 · 2022-07-20T00:34:57Z

This is also reproducible on GPT-J-6B model if you simply switch it

zcrypt0 · 2022-07-21T22:45:33Z

I am also seeing this, but not with every model. I do see it when using the tutorial model as well though.

jeffra · 2022-07-25T17:42:44Z

Thank you for reporting this! I've verified we can repro this on our side as well, but only when using >1 gpus. There's a gap currently in our CI tests for multi-gpu and certain models. We'll be fixing as soon as possible.

RezaYazdaniAminabadi · 2022-08-09T02:13:16Z

Hi @zcrypt0 and @lanking520 ,

Sorry for my delay! I just pushed a fix for this. Could you please try to see if the issue is fixed?
Thanks,
Reza

zcrypt0 · 2022-08-09T05:36:41Z

@RezaYazdaniAminabadi

I installed from your PR commit.
pip install git+https://github.com/microsoft/DeepSpeed@73fc0303bf723386df95be0e55259197e540506e

With the bigscience/bloom-350m model I don't see any change in the output, it still doesn't make any sense.

In fact, with that model, i see the issue even when using --num_gpus 1

I also tested the script that @lanking520 posted and I get the following error:

venv/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/transformer_inference.py", line 415, in selfAttention_fp
    qkv_out = qkv_func(
RuntimeError: Fail to create cublas handle.

I double checked by reverting the deepspeed installation to master and the test script still gives that error, so it's possible its something in my environment, although other models seem to work.

RezaYazdaniAminabadi · 2022-08-09T16:06:46Z

@zcrypt0 I think this must be related to some issue on your CUDA driver/library, since you even did not pass the first phase of creating a CUBLAS handle. Could you please try reinstalling them?
Thanks,
Reza

zcrypt0 · 2022-08-10T21:59:12Z

@RezaYazdaniAminabadi
I just saw #2194 and am thinking my issue may be related to that as I ran on 1080tis.

I am going to test out this script on a set of ampere gpus and see how it goes.

EDIT: I installed from master and ran the script on an 2xA100s. This was the output.

[{'generated_text': 'DeepSpeed is a software house that makes software that solves very hard problems\n\n"Why we do what we do"\n\nIn most cases, Fastest.fm\'s original business plan was to monetize the\ncontent its users provided. This'}]

mallorbc · 2022-08-17T04:56:24Z

I was also getting junk output following the tutorial. I can confirm that after building DeepSpeed from master that the issue seems resolved from GPT Neo 2.7B.

I am however having another issue with regard to memory usage. Even when I specify torch.half(or torch.float16), the model seems to use the full VRAM on both GPUS. For example, running GPTJ on dual 3090s leads to OOM issues with usage over 24GB on each.

Also, and perhaps I am misunderstanding the use of this tool, but isnt the VRAM usage supposed to be split over the multiple GPUs? So I would expect roughly 6-7GB usage per GPU rather than 24GB for each.

I give more details here #2227

jeffra · 2022-12-02T19:34:41Z

Closing, the original issue is resolved and new issue is moved to #2227

lanking520 added the bug Something isn't working label Jul 19, 2022

This was referenced Jul 20, 2022

[BUG] Cannot run DeepSpeed with transformers on NVIDIA Tesla T4 GPU #2001

Closed

[BUG] GPT-J + init_inference + replace_with_kernel_inject returns copy error with multiple GPUs #1719

Closed

Question: How to do multi-node inference? #2112

Closed

jeffra added the inference label Jul 29, 2022

RezaYazdaniAminabadi mentioned this issue Aug 9, 2022

Fix the tensor-slicing copy for qkv parameters #2198

Merged

This was referenced Aug 18, 2022

[BUG] High VRAM Usage For Inference, Torch Dtype Doesn't Matter #2227

Closed

[BUG][master branch] garbage GPTJ output for multi-gpu inference #2233

Closed

awan-10 assigned RezaYazdaniAminabadi Dec 2, 2022

jeffra closed this as completed Dec 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG][0.6.7] garbage output for multi-gpu with tutorial #2113

[BUG][0.6.7] garbage output for multi-gpu with tutorial #2113

lanking520 commented Jul 19, 2022 •

edited

Loading

lanking520 commented Jul 20, 2022

zcrypt0 commented Jul 21, 2022 •

edited

Loading

jeffra commented Jul 25, 2022

RezaYazdaniAminabadi commented Aug 9, 2022 •

edited

Loading

zcrypt0 commented Aug 9, 2022 •

edited

Loading

RezaYazdaniAminabadi commented Aug 9, 2022

zcrypt0 commented Aug 10, 2022 •

edited

Loading

mallorbc commented Aug 17, 2022

jeffra commented Dec 2, 2022

[BUG][0.6.7] garbage output for multi-gpu with tutorial #2113

[BUG][0.6.7] garbage output for multi-gpu with tutorial #2113

Comments

lanking520 commented Jul 19, 2022 • edited Loading

lanking520 commented Jul 20, 2022

zcrypt0 commented Jul 21, 2022 • edited Loading

jeffra commented Jul 25, 2022

RezaYazdaniAminabadi commented Aug 9, 2022 • edited Loading

zcrypt0 commented Aug 9, 2022 • edited Loading

RezaYazdaniAminabadi commented Aug 9, 2022

zcrypt0 commented Aug 10, 2022 • edited Loading

mallorbc commented Aug 17, 2022

jeffra commented Dec 2, 2022

lanking520 commented Jul 19, 2022 •

edited

Loading

zcrypt0 commented Jul 21, 2022 •

edited

Loading

RezaYazdaniAminabadi commented Aug 9, 2022 •

edited

Loading

zcrypt0 commented Aug 9, 2022 •

edited

Loading

zcrypt0 commented Aug 10, 2022 •

edited

Loading