Skip to content

Commit

Permalink
Update deepspeed.md
Browse files Browse the repository at this point in the history
Provide more clarity on the need for sequence parallelism.
  • Loading branch information
samadejacobs authored Oct 2, 2024
1 parent 9ea2571 commit 8766b91
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/en/deepspeed.md
Original file line number Diff line number Diff line change
Expand Up @@ -1142,7 +1142,7 @@ Using multiple GPUs with ZeRO-3 for generation requires synchronizing the GPUs b
For Transformers>=4.28, if `synced_gpus` is automatically set to `True` if multiple GPUs are detected during generation.

### Non-Trainer Sequence Parallelism
DeepSpeed sequence parallelism, also known as [DeepSpeed Ulysses](https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-ulysses/README.md), is compatible with HuggingFace Transformers by adding 'sequence_parallel_size' and 'data_parallel_size' to the DeepSpeed configuration. Additionally, it's required that the user’s script correctly shard the input data along the sequence dimension.
DeepSpeed sequence parallelism, also known as [DeepSpeed Ulysses](https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-ulysses/README.md), is a distributed training technique targeting long context LLM problems. Sequence parallelism would allow for a virtually indefinite growth in sequence length and model size with an increase in GPUs, unlimited by single GPU memory. DeepSpeed sequence parallelism is compatible with HuggingFace Transformers by adding 'sequence_parallel_size' and 'data_parallel_size' to the DeepSpeed configuration. Additionally, it's required that the user’s script correctly shard the input data along the sequence dimension.

```py
ds_config {
Expand Down

0 comments on commit 8766b91

Please sign in to comment.