[Deepspeed Inference] HF Integration #14426

stas00 · 2021-11-17T01:57:38Z

This PR is working on an integration of Deepspeed Inference which implements Tensor Parallelism. This is different from Deepspeed ZeRO inference.

This is a very early draft.

To try:

cd transformers
export BS=16; rm -r output_dir; PYTHONPATH=src USE_TF=0  \
deepspeed --num_gpus=2 examples/pytorch/translation/run_translation.py \
--model_name_or_path t5-small --output_dir output_dir --adam_eps 1e-06 \
--evaluation_strategy=steps --do_eval --label_smoothing 0.1 --learning_rate \
3e-5 --logging_first_step --logging_steps 500 --max_source_length 128 \
--max_target_length 128 --overwrite_output_dir --per_device_eval_batch_size $BS \
--predict_with_generate --sortish_sampler --source_lang en --target_lang ro \
--dataset_name wmt16 --dataset_config ro-en --source_prefix \
'translate English to Romanian: ' --val_max_target_length 128 --warmup_steps \
50 --max_eval_samples 50 --deepspeed_inference --skip_memory_metrics 0

and it currently hangs with --num_gpus > 1. One gpu finishes processing and the other is stuck in preparing inputs. So need to figure out the synchronization of the gpus.

src/transformers/deepspeed.py

…njection and for auto-parallelism

hyunwoongko · 2022-01-20T14:12:05Z

src/transformers/deepspeed.py

@@ -33,6 +33,54 @@
 logger = logging.get_logger(__name__)


+inference_custom_map = dict(
+    electra=dict(ElectraLayer=("output.dense")),


@stas00 This will only parallelize the output.dense layer and the other parts will be duplicated on all GPUs, resulting in memory inefficiency.

To parallelize all parts, all layer information must be input. This will be similar to the policy of the existing DeepSpeed Inference, and it will not be very different from the policy I used in Parallelformers.

@RezaYazdaniAminabadi Am I right? Or any other your opinions?

As the PR says this is very early. So basically all I did is converting an example that Reza gave me to have it integrated into HF Trainer. So treating it as a black box for now and waiting for Reza to complete the project before trying to understand how it works.

But I trust Reza will be happy to answer your question.

@hyunwoongko, this only shows that which linear layers would require an all_reduce. So, this is not going to use the same policy as when injecting the kernels. You can find more detail on how the other layers are partitioned on the replace_module function in DeepSpeed. But, basically this policy here is just showing which part need to be partitioned horizontally, whereas the rest are partitioned vertically. Does it make sense?

Thank you for the explanatory notes, @RezaYazdaniAminabadi - I have added them to the file, so this is covered.

HuggingFaceDocBuilderDev · 2022-04-05T18:43:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

stas00 added 2 commits November 16, 2021 17:54

[Deepspeed Inference] HF Integration

e78d4b0

wip

10a382b

huggingface deleted a comment from github-actions bot Dec 17, 2021

stas00 added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Dec 17, 2021

stas00 mentioned this pull request Dec 21, 2021

[DeepSpeed] Features to integrate / Optimizations to add / Experiments to do #9606

Open

7 tasks

stas00 mentioned this pull request Jan 1, 2022

Fine-tuning GPT-J-6B in colab: 8-bit weights with low-rank adaptors #14839

Open

stas00 commented Jan 7, 2022

View reviewed changes

src/transformers/deepspeed.py Outdated Show resolved Hide resolved

Reza Yazdani and others added 6 commits January 10, 2022 12:16

add a few more configs of the models supported by ds-inference auto-i…

6cbf7f1

…njection and for auto-parallelism

remove space after .

b28dc53

bf16

07ea57e

style

baa0327

Merge remote-tracking branch 'origin/master' into ds-inference

05ce25b

no need to pass model_type

33b3d12

stas00 added the Inference label Jan 10, 2022

hyunwoongko reviewed Jan 20, 2022

View reviewed changes

explain the maps

d169b8d

stas00 mentioned this pull request Jan 25, 2022

Tensor-Parallelism general support microsoft/DeepSpeed#1512

Merged

Merge remote-tracking branch 'origin/main' into ds-inference

598fe4e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Deepspeed Inference] HF Integration #14426

[Deepspeed Inference] HF Integration #14426

stas00 commented Nov 17, 2021 •

edited

Loading

hyunwoongko Jan 20, 2022

hyunwoongko Jan 20, 2022

hyunwoongko Jan 20, 2022

stas00 Jan 20, 2022

RezaYazdaniAminabadi Jan 20, 2022

stas00 Jan 21, 2022

HuggingFaceDocBuilderDev commented Apr 5, 2022

[Deepspeed Inference] HF Integration #14426

Are you sure you want to change the base?

[Deepspeed Inference] HF Integration #14426

Conversation

stas00 commented Nov 17, 2021 • edited Loading

hyunwoongko Jan 20, 2022

Choose a reason for hiding this comment

hyunwoongko Jan 20, 2022

Choose a reason for hiding this comment

hyunwoongko Jan 20, 2022

Choose a reason for hiding this comment

stas00 Jan 20, 2022

Choose a reason for hiding this comment

RezaYazdaniAminabadi Jan 20, 2022

Choose a reason for hiding this comment

stas00 Jan 21, 2022

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Apr 5, 2022

stas00 commented Nov 17, 2021 •

edited

Loading