Skip to content

Commit

Permalink
Deprecate the HorovodStrategy (#16141)
Browse files Browse the repository at this point in the history
  • Loading branch information
carmocca authored Dec 20, 2022
1 parent d0b620f commit bf8e568
Show file tree
Hide file tree
Showing 19 changed files with 80 additions and 101 deletions.
4 changes: 1 addition & 3 deletions docs/source-pytorch/accelerators/gpu_faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,21 +20,19 @@ Let's say you have a batch size of 7 in your dataloader.
def train_dataloader(self):
return Dataset(..., batch_size=7)

In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED, or Horovod your effective batch size will be 7 * devices * num_nodes.
In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED your effective batch size will be 7 * devices * num_nodes.

.. code-block:: python
# effective batch size = 7 * 8
Trainer(accelerator="gpu", devices=8, strategy="ddp")
Trainer(accelerator="gpu", devices=8, strategy="ddp_spawn")
Trainer(accelerator="gpu", devices=8, strategy="ddp_sharded")
Trainer(accelerator="gpu", devices=8, strategy="horovod")
# effective batch size = 7 * 8 * 10
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp")
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp_spawn")
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp_sharded")
Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="horovod")
.. note:: Huge batch sizes are actually really bad for convergence. Check out:
Expand Down
41 changes: 1 addition & 40 deletions docs/source-pytorch/accelerators/gpu_intermediate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ Lightning supports multiple ways of doing distributed training.
- Regular (``strategy='ddp'``)
- Spawn (``strategy='ddp_spawn'``)
- Notebook/Fork (``strategy='ddp_notebook'``)
- Horovod (``strategy='horovod'``) (multi-machine, multi-gpu, configured at runtime)
- Bagua (``strategy='bagua'``) (multiple-gpus across many machines with advanced training algorithms)

.. note::
Expand Down Expand Up @@ -236,44 +235,6 @@ Comparison of DDP variants and tradeoffs
- Fast


Horovod
^^^^^^^
`Horovod <http://horovod.ai>`_ allows the same training script to be used for single-GPU,
multi-GPU, and multi-node training.

Like Distributed Data Parallel, every process in Horovod operates on a single GPU with a fixed
subset of the data. Gradients are averaged across all GPUs in parallel during the backward pass,
then synchronously applied before beginning the next step.

The number of worker processes is configured by a driver application (`horovodrun` or `mpirun`). In
the training script, Horovod will detect the number of workers from the environment, and automatically
scale the learning rate to compensate for the increased total batch size.

Horovod can be configured in the training script to run with any number of GPUs / processes as follows:

.. code-block:: python
# train Horovod on GPU (number of GPUs / machines provided on command-line)
trainer = Trainer(strategy="horovod", accelerator="gpu", devices=1)
# train Horovod on CPU (number of processes / machines provided on command-line)
trainer = Trainer(strategy="horovod")
When starting the training job, the driver application will then be used to specify the total
number of worker processes:

.. code-block:: bash
# run training with 4 GPUs on a single machine
horovodrun -np 4 python train.py
# run training with 8 GPUs on two machines (4 GPUs each)
horovodrun -np 8 -H hostname1:4,hostname2:4 python train.py
See the official `Horovod documentation <https://horovod.readthedocs.io/en/stable>`_ for details
on installation and performance tuning.


Bagua
^^^^^
`Bagua <https://github.com/BaguaSys/bagua>`_ is a deep learning training acceleration framework which supports
Expand All @@ -284,7 +245,7 @@ multiple advanced distributed training algorithms including:
- `ByteGrad <https://tutorials.baguasys.com/algorithms/bytegrad>`_ and `QAdam <https://tutorials.baguasys.com/algorithms/q-adam>`_ for low precision communication, where data is compressed into low precision before communication.
- `Asynchronous Model Average <https://tutorials.baguasys.com/algorithms/async-model-average>`_ for asynchronous communication, where workers are not required to be synchronized in the same iteration in a lock-step style.

By default, Bagua uses *Gradient AllReduce* algorithm, which is also the algorithm implemented in Distributed Data Parallel and Horovod,
By default, Bagua uses *Gradient AllReduce* algorithm, which is also the algorithm implemented in DDP,
but Bagua can usually produce a higher training throughput due to its backend written in Rust.

.. code-block:: python
Expand Down
1 change: 0 additions & 1 deletion docs/source-pytorch/api_references.rst
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,6 @@ strategies
DataParallelStrategy
DeepSpeedStrategy
HivemindStrategy
HorovodStrategy
HPUParallelStrategy
IPUStrategy
ParallelStrategy
Expand Down
1 change: 0 additions & 1 deletion docs/source-pytorch/common/trainer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -424,7 +424,6 @@ deterministic
This flag sets the ``torch.backends.cudnn.deterministic`` flag.
Might make your system slower, but ensures reproducibility.
Also sets ``$HOROVOD_FUSION_THRESHOLD=0``.

For more info check `PyTorch docs <https://pytorch.org/docs/stable/notes/randomness.html>`_.

Expand Down
3 changes: 0 additions & 3 deletions docs/source-pytorch/extensions/strategy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,9 +102,6 @@ The below table lists all relevant strategies available in Lightning with their
* - deepspeed
- :class:`~pytorch_lightning.strategies.DeepSpeedStrategy`
- Provides capabilities to run training using the DeepSpeed library, with training optimizations for large billion parameter models. :ref:`Learn more. <advanced/model_parallel:deepspeed>`
* - horovod
- :class:`~pytorch_lightning.strategies.HorovodStrategy`
- Strategy for Horovod distributed training integration. :ref:`Learn more. <accelerators/gpu_intermediate:Horovod>`
* - hpu_parallel
- :class:`~pytorch_lightning.strategies.HPUParallelStrategy`
- Strategy for distributed training on multiple HPU devices. :doc:`Learn more. <../accelerators/hpu>`
Expand Down
2 changes: 1 addition & 1 deletion docs/source-pytorch/starter/lightning_lite.rst
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ Additionally, you can pass in your custom strategy by configuring additional par
lite = Lite(strategy=DeepSpeedStrategy(stage=2), accelerator="gpu", devices=2)
Support for Horovod and Fully Sharded training strategies are coming soon.
Support for Fully Sharded training strategies are coming soon.


devices
Expand Down
4 changes: 4 additions & 0 deletions src/pytorch_lightning/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
* Deprecates the `pytorch_lightning.utilities.enum.sAMPType` enum
* Deprecates the `DeepSpeedPrecisionPlugin(amp_type=..., amp_level=...)` arguments

- `horovod` deprecation ([#16141](https://github.com/PyTorchLightning/pytorch-lightning/pull/16141))
* Deprecated `Trainer(strategy="horovod")`
* Deprecated the `HorovodStrategy` class


### Removed

Expand Down
14 changes: 12 additions & 2 deletions src/pytorch_lightning/strategies/horovod.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

import torch
import torch.nn as nn
from lightning_utilities.core.imports import module_available
from torch import Tensor
from torch.optim import Optimizer

Expand All @@ -29,9 +30,9 @@
from pytorch_lightning.strategies.parallel import ParallelStrategy
from pytorch_lightning.strategies.strategy import TBroadcast
from pytorch_lightning.utilities.exceptions import MisconfigurationException
from pytorch_lightning.utilities.imports import _HOROVOD_AVAILABLE
from pytorch_lightning.utilities.rank_zero import rank_zero_only
from pytorch_lightning.utilities.rank_zero import rank_zero_deprecation, rank_zero_only

_HOROVOD_AVAILABLE = module_available("horovod.torch")
if _HOROVOD_AVAILABLE:
import horovod.torch as hvd

Expand All @@ -48,6 +49,15 @@ def __init__(
checkpoint_io: Optional[CheckpointIO] = None,
precision_plugin: Optional[PrecisionPlugin] = None,
):
rank_zero_deprecation(
"`The `HorovodStrategy`: `Trainer(strategy='horovod')` has been deprecated in v1.9.0 and will be removed"
" in v1.10.0. You can try using the `Trainer(strategy='ddp')` instead."
)
if not _HOROVOD_AVAILABLE:
raise MisconfigurationException(
'Requested `strategy="horovod"`, but Horovod is not installed.'
" Install with `HOROVOD_WITH_PYTORCH=1 pip install horovod[pytorch]`"
)
super().__init__(
accelerator=accelerator,
parallel_devices=parallel_devices,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,10 @@
TPUSpawnStrategy,
)
from pytorch_lightning.strategies.ddp_spawn import _DDP_FORK_ALIASES
from pytorch_lightning.strategies.horovod import _HOROVOD_AVAILABLE
from pytorch_lightning.tuner.auto_gpu_select import pick_multiple_gpus
from pytorch_lightning.utilities.exceptions import MisconfigurationException
from pytorch_lightning.utilities.imports import _HOROVOD_AVAILABLE, _IPU_AVAILABLE
from pytorch_lightning.utilities.imports import _IPU_AVAILABLE
from pytorch_lightning.utilities.rank_zero import rank_zero_deprecation, rank_zero_info, rank_zero_warn

log = logging.getLogger(__name__)
Expand Down Expand Up @@ -653,7 +654,7 @@ def _handle_horovod(self) -> None:
if not _HOROVOD_AVAILABLE:
raise MisconfigurationException(
'Requested `strategy="horovod"`, but Horovod is not installed.'
"Install with \n $HOROVOD_WITH_PYTORCH=1 pip install horovod[pytorch]"
" Install with `HOROVOD_WITH_PYTORCH=1 pip install horovod[pytorch]`"
)

hvd.init()
Expand Down
1 change: 0 additions & 1 deletion src/pytorch_lightning/utilities/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@
from pytorch_lightning.utilities.grads import grad_norm # noqa: F401
from pytorch_lightning.utilities.imports import ( # noqa: F401
_HIVEMIND_AVAILABLE,
_HOROVOD_AVAILABLE,
_HPU_AVAILABLE,
_IPU_AVAILABLE,
_OMEGACONF_AVAILABLE,
Expand Down
1 change: 0 additions & 1 deletion src/pytorch_lightning/utilities/imports.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@
_DALI_AVAILABLE = module_available("nvidia.dali")
_HABANA_FRAMEWORK_AVAILABLE = package_available("habana_frameworks")
_HIVEMIND_AVAILABLE = package_available("hivemind")
_HOROVOD_AVAILABLE = module_available("horovod.torch")
_KINETO_AVAILABLE = torch.profiler.kineto_available()
_OMEGACONF_AVAILABLE = package_available("omegaconf")
_POPTORCH_AVAILABLE = package_available("poptorch")
Expand Down
3 changes: 1 addition & 2 deletions tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ To test models that require GPU make sure to run the above command on a GPU mach
The GPU machine must have at least 2 GPUs to run distributed tests.

Note that this setup will not run tests that require specific packages installed
such as Horovod, FairScale, NVIDIA/apex, NVIDIA/DALI, etc.
You can rely on our CI to make sure all these tests pass.

### Standalone Tests
Expand All @@ -72,7 +71,7 @@ There are certain standalone tests, which you can run using:

## Running Coverage

Make sure to run coverage on a GPU machine with at least 2 GPUs and NVIDIA apex installed.
Make sure to run coverage on a GPU machine with at least 2 GPUs.

```bash
cd pytorch-lightning
Expand Down
1 change: 0 additions & 1 deletion tests/tests_lite/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,6 @@ def restore_env_variables():
"MASTER_PORT",
"PL_GLOBAL_SEED",
"PL_SEED_WORKERS",
"HOROVOD_FUSION_THRESHOLD",
"RANK", # set by DeepSpeed
"POPLAR_ENGINE_OPTIONS", # set by IPUStrategy
"CUDA_MODULE_LOADING", # leaked since PyTorch 1.13
Expand Down
2 changes: 1 addition & 1 deletion tests/tests_pytorch/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def restore_env_variables():
"WANDB_MODE",
"WANDB_REQUIRE_SERVICE",
"WANDB_SERVICE",
"HOROVOD_FUSION_THRESHOLD",
"HOROVOD_FUSION_THRESHOLD", # set by HorovodStrategy # TODO: remove in v1.10.0
"RANK", # set by DeepSpeed
"POPLAR_ENGINE_OPTIONS", # set by IPUStrategy
"CUDA_MODULE_LOADING", # leaked since PyTorch 1.13
Expand Down
6 changes: 6 additions & 0 deletions tests/tests_pytorch/deprecated_api/test_remove_1-10.py
Original file line number Diff line number Diff line change
Expand Up @@ -403,3 +403,9 @@ def optimizer_step(
trainer = Trainer()
with pytest.deprecated_call(match="amp_backend` will not be supported"):
trainer.amp_backend


@RunIf(horovod=True)
def test_horovod_deprecation_warnings(*_):
with pytest.deprecated_call(match=r"horovod'\)` has been deprecated in v1.9"):
Trainer(strategy="horovod")
10 changes: 5 additions & 5 deletions tests/tests_pytorch/helpers/runif.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@
from pytorch_lightning.strategies.bagua import _BAGUA_AVAILABLE
from pytorch_lightning.strategies.colossalai import _COLOSSALAI_AVAILABLE
from pytorch_lightning.strategies.deepspeed import _DEEPSPEED_AVAILABLE
from pytorch_lightning.strategies.horovod import _HOROVOD_AVAILABLE
from pytorch_lightning.utilities.imports import (
_HIVEMIND_AVAILABLE,
_HOROVOD_AVAILABLE,
_HPU_AVAILABLE,
_IPU_AVAILABLE,
_OMEGACONF_AVAILABLE,
Expand All @@ -42,12 +42,12 @@

_HOROVOD_NCCL_AVAILABLE = False
if _HOROVOD_AVAILABLE:
import horovod
import horovod.torch as hvd

try:

# `nccl_built` returns an integer
_HOROVOD_NCCL_AVAILABLE = bool(horovod.torch.nccl_built())
_HOROVOD_NCCL_AVAILABLE = bool(hvd.nccl_built())
except AttributeError:
# AttributeError can be raised if MPI is not available:
# https://github.com/horovod/horovod/blob/v0.23.0/horovod/torch/__init__.py#L33-L34
Expand Down Expand Up @@ -77,8 +77,8 @@ def __new__(
ipu: bool = False,
hpu: bool = False,
mps: Optional[bool] = None,
horovod: bool = False,
horovod_nccl: bool = False,
horovod: bool = False, # TODO: remove in v1.10.0
horovod_nccl: bool = False, # TODO: remove in v1.10.0
skip_windows: bool = False,
standalone: bool = False,
fairscale: bool = False,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

from pytorch_lightning import Trainer # noqa: E402
from pytorch_lightning.callbacks import ModelCheckpoint # noqa: E402
from pytorch_lightning.utilities import _HOROVOD_AVAILABLE # noqa: E402
from pytorch_lightning.strategies.horovod import _HOROVOD_AVAILABLE # noqa: E402

if _HOROVOD_AVAILABLE:
import horovod.torch as hvd
Expand Down
Loading

0 comments on commit bf8e568

Please sign in to comment.