Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Plugins doc #12440

Merged
merged 16 commits into from
Mar 29, 2022
10 changes: 2 additions & 8 deletions docs/source/advanced/model_parallel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,6 @@ Below we show an example of running `ZeRO-Offload <https://www.deepspeed.ai/tuto
.. code-block:: python

from pytorch_lightning import Trainer
from pytorch_lightning.strategies import DeepSpeedStrategy

model = MyModel()
trainer = Trainer(accelerator="gpu", devices=4, strategy="deepspeed_stage_2_offload", precision=16)
Expand Down Expand Up @@ -341,7 +340,6 @@ For even more speed benefit, DeepSpeed offers an optimized CPU version of ADAM c

import pytorch_lightning
from pytorch_lightning import Trainer
from pytorch_lightning.strategies import DeepSpeedStrategy
from deepspeed.ops.adam import DeepSpeedCPUAdam


Expand Down Expand Up @@ -385,7 +383,6 @@ Also please have a look at our :ref:`deepspeed-zero-stage-3-tips` which contains
.. code-block:: python

from pytorch_lightning import Trainer
from pytorch_lightning.strategies import DeepSpeedStrategy
from deepspeed.ops.adam import FusedAdam


Expand All @@ -409,7 +406,6 @@ You can also use the Lightning Trainer to run predict or evaluate with DeepSpeed
.. code-block:: python

from pytorch_lightning import Trainer
from pytorch_lightning.strategies import DeepSpeedStrategy


class MyModel(pl.LightningModule):
Expand All @@ -435,7 +431,6 @@ This reduces the time taken to initialize very large models, as well as ensure w

import torch.nn as nn
from pytorch_lightning import Trainer
from pytorch_lightning.strategies import DeepSpeedStrategy
from deepspeed.ops.adam import FusedAdam


Expand Down Expand Up @@ -549,7 +544,6 @@ This saves memory when training larger models, however requires using a checkpoi
.. code-block:: python

from pytorch_lightning import Trainer
from pytorch_lightning.strategies import DeepSpeedStrategy
import deepspeed


Expand Down Expand Up @@ -686,7 +680,7 @@ In some cases you may want to define your own DeepSpeed Config, to access all pa
}

model = MyModel()
trainer = Trainer(accelerator="gpu", devices=4, strategy=DeepSpeedStrategy(deepspeed_config), precision=16)
trainer = Trainer(accelerator="gpu", devices=4, strategy=DeepSpeedStrategy(config=deepspeed_config), precision=16)
trainer.fit(model)


Expand All @@ -699,7 +693,7 @@ We support taking the config as a json formatted file:

model = MyModel()
trainer = Trainer(
accelerator="gpu", devices=4, strategy=DeepSpeedStrategy("/path/to/deepspeed_config.json"), precision=16
accelerator="gpu", devices=4, strategy=DeepSpeedStrategy(config="/path/to/deepspeed_config.json"), precision=16
)
trainer.fit(model)

Expand Down
3 changes: 2 additions & 1 deletion docs/source/common/checkpointing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,7 @@ and the Lightning Team will be happy to integrate/help integrate it.

-----------

.. _customize_checkpointing:

***********************
Customize Checkpointing
Expand Down Expand Up @@ -392,7 +393,7 @@ Custom Checkpoint IO Plugin

.. note::

Some ``TrainingTypePlugins`` like ``DeepSpeedStrategy`` do not support custom ``CheckpointIO`` as checkpointing logic is not modifiable.
Some strategies like :class:`~pytorch_lightning.strategies.deepspeed.DeepSpeedStrategy` do not support custom :class:`~pytorch_lightning.plugins.io.checkpoint_plugin.CheckpointIO` as checkpointing logic is not modifiable.

-----------

Expand Down
2 changes: 1 addition & 1 deletion docs/source/common/lightning_module.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1056,7 +1056,7 @@ automatic_optimization
When set to ``False``, Lightning does not automate the optimization process. This means you are responsible for handling
your optimizers. However, we do take care of precision and any accelerators used.

See :ref:`manual optimization<common/optimization:Manual optimization>` for details.
See :ref:`manual optimization <common/optimization:Manual optimization>` for details.

.. code-block:: python

Expand Down
90 changes: 49 additions & 41 deletions docs/source/extensions/plugins.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,54 +6,32 @@ Plugins

.. include:: ../links.rst

Plugins allow custom integrations to the internals of the Trainer such as a custom precision or
distributed implementation.
Plugins allow custom integrations to the internals of the Trainer such as custom precision, checkpointing or
cluster environment implementation.

Under the hood, the Lightning Trainer is using plugins in the training routine, added automatically
depending on the provided Trainer arguments. For example:
depending on the provided Trainer arguments.

.. code-block:: python

# accelerator: GPUAccelerator
# training strategy: DDPStrategy
# precision: NativeMixedPrecisionPlugin
trainer = Trainer(accelerator="gpu", devices=4, precision=16)


We expose Accelerators and Plugins mainly for expert users that want to extend Lightning for:

- New hardware (like TPU plugin)
- Distributed backends (e.g. a backend not yet supported by
`PyTorch <https://pytorch.org/docs/stable/distributed.html#backends>`_ itself)
- Clusters (e.g. customized access to the cluster's environment interface)

There are two types of Plugins in Lightning with different responsibilities:

Strategy
--------

- Launching and teardown of training processes (if applicable)
- Setup communication between processes (NCCL, GLOO, MPI, ...)
- Provide a unified communication interface for reduction, broadcast, etc.
- Provide access to the wrapped LightningModule
There are three types of Plugins in Lightning with different responsibilities:

- Precision Plugins
- CheckpointIO Plugins
- Cluster Environments

Furthermore, for multi-node training Lightning provides cluster environment plugins that allow the advanced user
to configure Lightning to integrate with a :ref:`custom-cluster`.

*****************
Precision Plugins
*****************

.. image:: ../_static/images/accelerator/overview.svg


The full list of built-in plugins is listed below.

We provide precision plugins for you to benefit from numerical representations with lower precision than
32-bit floating-point or higher precision, such as 64-bit floating-point.

.. warning:: The Plugin API is in beta and subject to change.
For help setting up custom plugins/accelerators, please reach out to us at **[email protected]**
.. code-block:: python

# Training with 16-bit precision
trainer = Trainer(precision=16)

Precision Plugins
-----------------
The full list of built-in precision plugins is listed below.

.. currentmodule:: pytorch_lightning.plugins.precision

Expand All @@ -74,9 +52,39 @@ Precision Plugins
TPUBf16PrecisionPlugin
TPUPrecisionPlugin

More information regarding precision with Lightning can be found :doc:`here <../advanced/precision>`

-----------

********************
CheckpointIO Plugins
********************

As part of our commitment to extensibility, we have abstracted Lightning's checkpointing logic into the :class:`~pytorch_lightning.plugins.io.CheckpointIO` plugin.
With this, you have the ability to customize the checkpointing logic to match the needs of your infrastructure.

Below is a list of built-in plugins for checkpointing.

.. currentmodule:: pytorch_lightning.plugins.io

.. autosummary::
:nosignatures:
:template: classtemplate.rst

CheckpointIO
HPUCheckpointIO
TorchCheckpointIO
XLACheckpointIO

You could learn more about custom checkpointing with Lightning :ref:`here <customize_checkpointing>`.

-----------

********************
Cluster Environments
--------------------
********************

You can define the interface of your own cluster environment based on the requirements of your infrastructure.

.. currentmodule:: pytorch_lightning.plugins.environments

Expand All @@ -85,8 +93,8 @@ Cluster Environments
:template: classtemplate.rst

ClusterEnvironment
KubeflowEnvironment
LightningEnvironment
LSFEnvironment
TorchElasticEnvironment
KubeflowEnvironment
SLURMEnvironment
TorchElasticEnvironment