huggingface · lvwerra · Jun 23, 2022 · Jun 17, 2022 · Jun 17, 2022 · Jun 17, 2022
diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
@@ -59,7 +59,29 @@
     title: Converting TensorFlow Checkpoints
   - local: serialization
     title: Export 🤗 Transformers models
-  - local: performance
+  - sections:
+    - local: performance
+      title: Overview
+    - local: perf_train_gpu_one
+      title: Training on one GPU
+    - local: perf_train_gpu_many
+      title: Training on many GPUs
+    - local: perf_train_cpu
+      title: Training on CPU
+    - local: perf_train_tpu
+      title: Training on TPUs
+    - local: perf_train_special
+      title: Training on Specialized Hardware
+    - local: perf_infer_cpu
+      title: Inference on CPU
+    - local: perf_infer_gpu_one
+      title: Inference on one GPU
+    - local: perf_infer_gpu_many
+      title: Inference on many GPUs
+    - local: perf_infer_special
+      title: Inference on Specialized Hardware
+    - local: perf_hardware
+      title: Custom hardware for training
     title: Performance and scalability
   - local: big_models
     title: Instantiating a big model
@@ -81,16 +103,6 @@
     title: "How to add a model to 🤗 Transformers?"
   - local: add_new_pipeline
     title: "How to add a pipeline to 🤗 Transformers?"
-  - local: perf_train_gpu_one
-    title: Training on one GPU
-  - local: perf_train_gpu_many
-    title: Training on many GPUs
-  - local: perf_train_cpu
-    title: Training on CPU
-  - local: perf_infer_cpu
-    title: Inference on CPU
-  - local: perf_hardware
-    title: Custom hardware for training
   - local: testing
     title: Testing
   - local: pr_checks

diff --git a/docs/source/en/perf_infer_gpu_many.mdx b/docs/source/en/perf_infer_gpu_many.mdx
@@ -0,0 +1,14 @@
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+-->
+
+# Efficient Inference on a Multiple GPUs
+
+This document will be completed soon with information on how to infer on a multiple GPUs. In the meantime you can check out [the guide for training on a single GPU](perf_train_gpu_one) and [the guide for inference on CPUs](perf_infer_cpu).
diff --git a/docs/source/en/perf_infer_gpu_one.mdx b/docs/source/en/perf_infer_gpu_one.mdx
@@ -0,0 +1,14 @@
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+-->
+
+# Efficient Inference on a Single GPU
+
+This document will be completed soon with information on how to infer on a single GPU. In the meantime you can check out [the guide for training on a single GPU](perf_train_gpu_one) and [the guide for inference on CPUs](perf_infer_cpu).
diff --git a/docs/source/en/perf_infer_special.mdx b/docs/source/en/perf_infer_special.mdx
@@ -0,0 +1,14 @@
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+-->
+
+# Inference on Specialized Hardware
+
+This document will be completed soon with information on how to infer on specialized hardware. In the meantime you can check out [the guide for inference on CPUs](perf_infer_cpu).
diff --git a/docs/source/en/perf_train_gpu_many.mdx b/docs/source/en/perf_train_gpu_many.mdx
@@ -13,6 +13,12 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
 
 When training on a single GPU is too slow or the model weights don't fit in a single GPUs memory we use a mutli-GPU setup. Switching from a single GPU to multiple requires some form of parallelism as the work needs to be distributed. There are several techniques to achieve parallism such as data, tensor, or pipeline parallism. However, there is no one solution to fit them all and which settings works best depends on the hardware you are running on. While the main concepts most likely will apply to any other framework, this article is focused on PyTorch-based implementations.
 
+<Tip>
+
+ Note: Most of the strategies introduced in the [single GPU section](perf_train_gpu_one) (such as mixed precision training or gradient accumulation) are generic and apply to training models in general so make sure to have a look at it before diving into the following sections such as multi-GPU or CPU training.
+
+</Tip>
+
 We will first discuss in depth various 1D parallelism techniques and their pros and cons and then look at how they can be combined into 2D and 3D parallelism to enable an even faster training and to support even bigger models. Various other powerful alternative approaches will be presented.
 
 ## Concepts

diff --git a/docs/source/en/perf_train_special.mdx b/docs/source/en/perf_train_special.mdx
@@ -0,0 +1,20 @@
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+-->
+
+# Training on Specialized Hardware
+
+<Tip>
+
+ Note: Most of the strategies introduced in the [single GPU section](perf_train_gpu_one) (such as mixed precision training or gradient accumulation) and [mutli-GPU section](perf_train_gpu_many) are generic and apply to training models in general so make sure to have a look at it before diving into this section.
+
+</Tip>
+
+This document will be completed soon with information on how to train on specialized hardware.
diff --git a/docs/source/en/perf_train_tpu.mdx b/docs/source/en/perf_train_tpu.mdx
@@ -0,0 +1,20 @@
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+-->
+
+# Training on TPUs
+
+<Tip>
+
+ Note: Most of the strategies introduced in the [single GPU section](perf_train_gpu_one) (such as mixed precision training or gradient accumulation) and [mutli-GPU section](perf_train_gpu_many) are generic and apply to training models in general so make sure to have a look at it before diving into this section.
+
+</Tip>
+
+This document will be completed soon with information on how to train on TPUs.
diff --git a/docs/source/en/performance.mdx b/docs/source/en/performance.mdx
@@ -24,7 +24,13 @@ This document serves as an overview and entry point for the methods that could b
 
 ## Training
 
-Training transformer models efficiently requires an accelerator such as a GPU or TPU. The most common case is where you only have a single GPU.
+Training transformer models efficiently requires an accelerator such as a GPU or TPU. The most common case is where you only have a single GPU, but there is also a section about mutli-GPU and CPU training (with more coming soon).
+
+<Tip>
+
+ Note: Most of the strategies introduced in the single GPU sections (such as mixed precision training or gradient accumulation) are generic and apply to training models in general so make sure to have a look at it before diving into the following sections such as multi-GPU or CPU training.
+
+</Tip>
 
 ### Single GPU
 
@@ -46,31 +52,31 @@ In some cases training on a single GPU is still too slow or won't fit the large
 
 ### TPU
 
-_Coming soon_
+[_Coming soon_](perf_train_tpu)
 
 ### Specialized Hardware
 
-_Coming soon_
+[_Coming soon_](perf_train_special)
 
 ## Inference
 
 Efficient inference with large models in a production environment can be as challenging as training them. In the following sections we go through the steps to run inference on CPU and single/multi-GPU setups.
 
 ### CPU
 
-[Go to CPU inference section](perf_infer_cpu.mdx)
+[Go to CPU inference section](perf_infer_cpu)
 
 ### Single GPU
 
-_Coming soon_
+[Go to single GPU inference section](perf_infer_gpu_one)
 
 ### Multi-GPU
 
-_Coming soon_
+[Go to multi-GPU inference section](perf_infer_gpu_many)
 
 ### Specialized Hardware
 
-_Coming soon_
+[_Coming soon_](perf_infer_special)
 
 ## Hardware