Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance docs #17750

Merged
merged 11 commits into from
Jun 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 23 additions & 11 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,29 @@
title: Converting TensorFlow Checkpoints
- local: serialization
title: Export 🤗 Transformers models
- local: performance
- sections:
- local: performance
title: Overview
- local: perf_train_gpu_one
title: Training on one GPU
- local: perf_train_gpu_many
title: Training on many GPUs
- local: perf_train_cpu
title: Training on CPU
- local: perf_train_tpu
title: Training on TPUs
- local: perf_train_special
title: Training on Specialized Hardware
- local: perf_infer_cpu
title: Inference on CPU
- local: perf_infer_gpu_one
title: Inference on one GPU
- local: perf_infer_gpu_many
title: Inference on many GPUs
- local: perf_infer_special
title: Inference on Specialized Hardware
- local: perf_hardware
title: Custom hardware for training
title: Performance and scalability
- local: big_models
title: Instantiating a big model
Expand All @@ -81,16 +103,6 @@
title: "How to add a model to 🤗 Transformers?"
- local: add_new_pipeline
title: "How to add a pipeline to 🤗 Transformers?"
- local: perf_train_gpu_one
title: Training on one GPU
- local: perf_train_gpu_many
title: Training on many GPUs
- local: perf_train_cpu
title: Training on CPU
- local: perf_infer_cpu
title: Inference on CPU
- local: perf_hardware
title: Custom hardware for training
- local: testing
title: Testing
- local: pr_checks
Expand Down
14 changes: 14 additions & 0 deletions docs/source/en/perf_infer_gpu_many.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-->

# Efficient Inference on a Multiple GPUs

This document will be completed soon with information on how to infer on a multiple GPUs. In the meantime you can check out [the guide for training on a single GPU](perf_train_gpu_one) and [the guide for inference on CPUs](perf_infer_cpu).
14 changes: 14 additions & 0 deletions docs/source/en/perf_infer_gpu_one.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-->

# Efficient Inference on a Single GPU

This document will be completed soon with information on how to infer on a single GPU. In the meantime you can check out [the guide for training on a single GPU](perf_train_gpu_one) and [the guide for inference on CPUs](perf_infer_cpu).
14 changes: 14 additions & 0 deletions docs/source/en/perf_infer_special.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-->

# Inference on Specialized Hardware

This document will be completed soon with information on how to infer on specialized hardware. In the meantime you can check out [the guide for inference on CPUs](perf_infer_cpu).
6 changes: 6 additions & 0 deletions docs/source/en/perf_train_gpu_many.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,12 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o

When training on a single GPU is too slow or the model weights don't fit in a single GPUs memory we use a mutli-GPU setup. Switching from a single GPU to multiple requires some form of parallelism as the work needs to be distributed. There are several techniques to achieve parallism such as data, tensor, or pipeline parallism. However, there is no one solution to fit them all and which settings works best depends on the hardware you are running on. While the main concepts most likely will apply to any other framework, this article is focused on PyTorch-based implementations.

<Tip>

Note: Most of the strategies introduced in the [single GPU section](perf_train_gpu_one) (such as mixed precision training or gradient accumulation) are generic and apply to training models in general so make sure to have a look at it before diving into the following sections such as multi-GPU or CPU training.

</Tip>

We will first discuss in depth various 1D parallelism techniques and their pros and cons and then look at how they can be combined into 2D and 3D parallelism to enable an even faster training and to support even bigger models. Various other powerful alternative approaches will be presented.

## Concepts
Expand Down
20 changes: 20 additions & 0 deletions docs/source/en/perf_train_special.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-->

# Training on Specialized Hardware

<Tip>

Note: Most of the strategies introduced in the [single GPU section](perf_train_gpu_one) (such as mixed precision training or gradient accumulation) and [mutli-GPU section](perf_train_gpu_many) are generic and apply to training models in general so make sure to have a look at it before diving into this section.

</Tip>

This document will be completed soon with information on how to train on specialized hardware.
20 changes: 20 additions & 0 deletions docs/source/en/perf_train_tpu.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
-->

# Training on TPUs

<Tip>

Note: Most of the strategies introduced in the [single GPU section](perf_train_gpu_one) (such as mixed precision training or gradient accumulation) and [mutli-GPU section](perf_train_gpu_many) are generic and apply to training models in general so make sure to have a look at it before diving into this section.

</Tip>

This document will be completed soon with information on how to train on TPUs.
20 changes: 13 additions & 7 deletions docs/source/en/performance.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,13 @@ This document serves as an overview and entry point for the methods that could b

## Training

Training transformer models efficiently requires an accelerator such as a GPU or TPU. The most common case is where you only have a single GPU.
Training transformer models efficiently requires an accelerator such as a GPU or TPU. The most common case is where you only have a single GPU, but there is also a section about mutli-GPU and CPU training (with more coming soon).

<Tip>

Note: Most of the strategies introduced in the single GPU sections (such as mixed precision training or gradient accumulation) are generic and apply to training models in general so make sure to have a look at it before diving into the following sections such as multi-GPU or CPU training.

</Tip>

### Single GPU

Expand All @@ -46,31 +52,31 @@ In some cases training on a single GPU is still too slow or won't fit the large

### TPU

_Coming soon_
[_Coming soon_](perf_train_tpu)

### Specialized Hardware

_Coming soon_
[_Coming soon_](perf_train_special)

## Inference

Efficient inference with large models in a production environment can be as challenging as training them. In the following sections we go through the steps to run inference on CPU and single/multi-GPU setups.

### CPU

[Go to CPU inference section](perf_infer_cpu.mdx)
[Go to CPU inference section](perf_infer_cpu)

### Single GPU

_Coming soon_
[Go to single GPU inference section](perf_infer_gpu_one)

### Multi-GPU

_Coming soon_
[Go to multi-GPU inference section](perf_infer_gpu_many)

### Specialized Hardware

_Coming soon_
[_Coming soon_](perf_infer_special)

## Hardware

Expand Down