Lightning-AI · carmocca · Dec 20, 2022 · Dec 20, 2022 · Dec 20, 2022 · Dec 20, 2022
@@ -20,21 +20,19 @@ Let's say you have a batch size of 7 in your dataloader.
         def train_dataloader(self):
             return Dataset(..., batch_size=7)
 
-In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED, or Horovod your effective batch size will be 7 * devices * num_nodes.
+In DDP, DDP_SPAWN, Deepspeed, DDP_SHARDED your effective batch size will be 7 * devices * num_nodes.
 
 .. code-block:: python
 
     # effective batch size = 7 * 8
     Trainer(accelerator="gpu", devices=8, strategy="ddp")
     Trainer(accelerator="gpu", devices=8, strategy="ddp_spawn")
     Trainer(accelerator="gpu", devices=8, strategy="ddp_sharded")
-    Trainer(accelerator="gpu", devices=8, strategy="horovod")
 
     # effective batch size = 7 * 8 * 10
     Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp")
     Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp_spawn")
     Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="ddp_sharded")
-    Trainer(accelerator="gpu", devices=8, num_nodes=10, strategy="horovod")
 
 
 .. note:: Huge batch sizes are actually really bad for convergence. Check out:

@@ -25,7 +25,6 @@ Lightning supports multiple ways of doing distributed training.
     - Regular (``strategy='ddp'``)
     - Spawn (``strategy='ddp_spawn'``)
     - Notebook/Fork (``strategy='ddp_notebook'``)
-- Horovod (``strategy='horovod'``) (multi-machine, multi-gpu, configured at runtime)
 - Bagua (``strategy='bagua'``) (multiple-gpus across many machines with advanced training algorithms)
 
 .. note::
@@ -236,44 +235,6 @@ Comparison of DDP variants and tradeoffs
      - Fast
 
 
-Horovod
-^^^^^^^
-`Horovod <http://horovod.ai>`_ allows the same training script to be used for single-GPU,
-multi-GPU, and multi-node training.
-
-Like Distributed Data Parallel, every process in Horovod operates on a single GPU with a fixed
-subset of the data.  Gradients are averaged across all GPUs in parallel during the backward pass,
-then synchronously applied before beginning the next step.
-
-The number of worker processes is configured by a driver application (`horovodrun` or `mpirun`). In
-the training script, Horovod will detect the number of workers from the environment, and automatically
-scale the learning rate to compensate for the increased total batch size.
-
-Horovod can be configured in the training script to run with any number of GPUs / processes as follows:
-
-.. code-block:: python
-
-    # train Horovod on GPU (number of GPUs / machines provided on command-line)
-    trainer = Trainer(strategy="horovod", accelerator="gpu", devices=1)
-
-    # train Horovod on CPU (number of processes / machines provided on command-line)
-    trainer = Trainer(strategy="horovod")
-
-When starting the training job, the driver application will then be used to specify the total
-number of worker processes:
-
-.. code-block:: bash
-
-    # run training with 4 GPUs on a single machine
-    horovodrun -np 4 python train.py
-
-    # run training with 8 GPUs on two machines (4 GPUs each)
-    horovodrun -np 8 -H hostname1:4,hostname2:4 python train.py
-
-See the official `Horovod documentation <https://horovod.readthedocs.io/en/stable>`_ for details
-on installation and performance tuning.
-
-
 Bagua
 ^^^^^
 `Bagua <https://github.com/BaguaSys/bagua>`_ is a deep learning training acceleration framework which supports
@@ -284,7 +245,7 @@ multiple advanced distributed training algorithms including:
 - `ByteGrad <https://tutorials.baguasys.com/algorithms/bytegrad>`_ and `QAdam <https://tutorials.baguasys.com/algorithms/q-adam>`_ for low precision communication, where data is compressed into low precision before communication.
 - `Asynchronous Model Average <https://tutorials.baguasys.com/algorithms/async-model-average>`_ for asynchronous communication, where workers are not required to be synchronized in the same iteration in a lock-step style.
 
-By default, Bagua uses *Gradient AllReduce* algorithm, which is also the algorithm implemented in Distributed Data Parallel and Horovod,
+By default, Bagua uses *Gradient AllReduce* algorithm, which is also the algorithm implemented in DDP,
 but Bagua can usually produce a higher training throughput due to its backend written in Rust.
 
 .. code-block:: python

@@ -295,7 +295,6 @@ strategies
     DataParallelStrategy
     DeepSpeedStrategy
     HivemindStrategy
-    HorovodStrategy
     HPUParallelStrategy
     IPUStrategy
     ParallelStrategy

@@ -424,7 +424,6 @@ deterministic
 
 This flag sets the ``torch.backends.cudnn.deterministic`` flag.
 Might make your system slower, but ensures reproducibility.
-Also sets ``$HOROVOD_FUSION_THRESHOLD=0``.
 
 For more info check `PyTorch docs <https://pytorch.org/docs/stable/notes/randomness.html>`_.
 

@@ -102,9 +102,6 @@ The below table lists all relevant strategies available in Lightning with their
    * - deepspeed
      - :class:`~pytorch_lightning.strategies.DeepSpeedStrategy`
      - Provides capabilities to run training using the DeepSpeed library, with training optimizations for large billion parameter models. :ref:`Learn more. <advanced/model_parallel:deepspeed>`
-   * - horovod
-     - :class:`~pytorch_lightning.strategies.HorovodStrategy`
-     - Strategy for Horovod distributed training integration. :ref:`Learn more. <accelerators/gpu_intermediate:Horovod>`
    * - hpu_parallel
      - :class:`~pytorch_lightning.strategies.HPUParallelStrategy`
      - Strategy for distributed training on multiple HPU devices. :doc:`Learn more. <../accelerators/hpu>`

@@ -276,7 +276,7 @@ Additionally, you can pass in your custom strategy by configuring additional par
     lite = Lite(strategy=DeepSpeedStrategy(stage=2), accelerator="gpu", devices=2)
 
 
-Support for Horovod and Fully Sharded training strategies are coming soon.
+Support for Fully Sharded training strategies are coming soon.
 
 
 devices

@@ -86,6 +86,10 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
   * Deprecates the `pytorch_lightning.utilities.enum.sAMPType` enum
   * Deprecates the `DeepSpeedPrecisionPlugin(amp_type=..., amp_level=...)` arguments
 
+- `horovod` deprecation ([#16141](https://github.com/PyTorchLightning/pytorch-lightning/pull/16141))
+  * Deprecated `Trainer(strategy="horovod")`
+  * Deprecated the `HorovodStrategy` class
+
 
 ### Removed
 

@@ -16,6 +16,7 @@
 
 import torch
 import torch.nn as nn
+from lightning_utilities.core.imports import module_available
 from torch import Tensor
 from torch.optim import Optimizer
 
@@ -29,9 +30,9 @@
 from pytorch_lightning.strategies.parallel import ParallelStrategy
 from pytorch_lightning.strategies.strategy import TBroadcast
 from pytorch_lightning.utilities.exceptions import MisconfigurationException
-from pytorch_lightning.utilities.imports import _HOROVOD_AVAILABLE
-from pytorch_lightning.utilities.rank_zero import rank_zero_only
+from pytorch_lightning.utilities.rank_zero import rank_zero_deprecation, rank_zero_only
 
+_HOROVOD_AVAILABLE = module_available("horovod.torch")
 if _HOROVOD_AVAILABLE:
     import horovod.torch as hvd
 
@@ -48,6 +49,15 @@ def __init__(
         checkpoint_io: Optional[CheckpointIO] = None,
         precision_plugin: Optional[PrecisionPlugin] = None,
     ):
+        rank_zero_deprecation(
+            "`The `HorovodStrategy`: `Trainer(strategy='horovod')` has been deprecated in v1.9.0 and will be removed"
+            " in v1.10.0. You can try using the `Trainer(strategy='ddp')` instead."
+        )
+        if not _HOROVOD_AVAILABLE:
+            raise MisconfigurationException(
+                'Requested `strategy="horovod"`, but Horovod is not installed.'
+                " Install with `HOROVOD_WITH_PYTORCH=1 pip install horovod[pytorch]`"
+            )
         super().__init__(
             accelerator=accelerator,
             parallel_devices=parallel_devices,

@@ -78,9 +78,10 @@
     TPUSpawnStrategy,
 )
 from pytorch_lightning.strategies.ddp_spawn import _DDP_FORK_ALIASES
+from pytorch_lightning.strategies.horovod import _HOROVOD_AVAILABLE
 from pytorch_lightning.tuner.auto_gpu_select import pick_multiple_gpus
 from pytorch_lightning.utilities.exceptions import MisconfigurationException
-from pytorch_lightning.utilities.imports import _HOROVOD_AVAILABLE, _IPU_AVAILABLE
+from pytorch_lightning.utilities.imports import _IPU_AVAILABLE
 from pytorch_lightning.utilities.rank_zero import rank_zero_deprecation, rank_zero_info, rank_zero_warn
 
 log = logging.getLogger(__name__)
@@ -653,7 +654,7 @@ def _handle_horovod(self) -> None:
         if not _HOROVOD_AVAILABLE:
             raise MisconfigurationException(
                 'Requested `strategy="horovod"`, but Horovod is not installed.'
-                "Install with \n $HOROVOD_WITH_PYTORCH=1 pip install horovod[pytorch]"
+                " Install with `HOROVOD_WITH_PYTORCH=1 pip install horovod[pytorch]`"
             )
 
         hvd.init()

@@ -23,7 +23,6 @@
 from pytorch_lightning.utilities.grads import grad_norm  # noqa: F401
 from pytorch_lightning.utilities.imports import (  # noqa: F401
     _HIVEMIND_AVAILABLE,
-    _HOROVOD_AVAILABLE,
     _HPU_AVAILABLE,
     _IPU_AVAILABLE,
     _OMEGACONF_AVAILABLE,

@@ -27,7 +27,6 @@
 _DALI_AVAILABLE = module_available("nvidia.dali")
 _HABANA_FRAMEWORK_AVAILABLE = package_available("habana_frameworks")
 _HIVEMIND_AVAILABLE = package_available("hivemind")
-_HOROVOD_AVAILABLE = module_available("horovod.torch")
 _KINETO_AVAILABLE = torch.profiler.kineto_available()
 _OMEGACONF_AVAILABLE = package_available("omegaconf")
 _POPTORCH_AVAILABLE = package_available("poptorch")

@@ -57,7 +57,6 @@ To test models that require GPU make sure to run the above command on a GPU mach
 The GPU machine must have at least 2 GPUs to run distributed tests.
 
 Note that this setup will not run tests that require specific packages installed
-such as Horovod, FairScale, NVIDIA/apex, NVIDIA/DALI, etc.
 You can rely on our CI to make sure all these tests pass.
 
 ### Standalone Tests
@@ -72,7 +71,7 @@ There are certain standalone tests, which you can run using:
 
 ## Running Coverage
 
-Make sure to run coverage on a GPU machine with at least 2 GPUs and NVIDIA apex installed.
+Make sure to run coverage on a GPU machine with at least 2 GPUs.
 
 ```bash
 cd pytorch-lightning

@@ -51,7 +51,6 @@ def restore_env_variables():
         "MASTER_PORT",
         "PL_GLOBAL_SEED",
         "PL_SEED_WORKERS",
-        "HOROVOD_FUSION_THRESHOLD",
         "RANK",  # set by DeepSpeed
         "POPLAR_ENGINE_OPTIONS",  # set by IPUStrategy
         "CUDA_MODULE_LOADING",  # leaked since PyTorch 1.13

@@ -69,7 +69,7 @@ def restore_env_variables():
         "WANDB_MODE",
         "WANDB_REQUIRE_SERVICE",
         "WANDB_SERVICE",
-        "HOROVOD_FUSION_THRESHOLD",
+        "HOROVOD_FUSION_THRESHOLD",  # set by HorovodStrategy # TODO: remove in v1.10.0
         "RANK",  # set by DeepSpeed
         "POPLAR_ENGINE_OPTIONS",  # set by IPUStrategy
         "CUDA_MODULE_LOADING",  # leaked since PyTorch 1.13

@@ -403,3 +403,9 @@ def optimizer_step(
     trainer = Trainer()
     with pytest.deprecated_call(match="amp_backend` will not be supported"):
         trainer.amp_backend
+
+
+@RunIf(horovod=True)
+def test_horovod_deprecation_warnings(*_):
+    with pytest.deprecated_call(match=r"horovod'\)` has been deprecated in v1.9"):
+        Trainer(strategy="horovod")
@@ -29,9 +29,9 @@
 from pytorch_lightning.strategies.bagua import _BAGUA_AVAILABLE
 from pytorch_lightning.strategies.colossalai import _COLOSSALAI_AVAILABLE
 from pytorch_lightning.strategies.deepspeed import _DEEPSPEED_AVAILABLE
+from pytorch_lightning.strategies.horovod import _HOROVOD_AVAILABLE
 from pytorch_lightning.utilities.imports import (
     _HIVEMIND_AVAILABLE,
-    _HOROVOD_AVAILABLE,
     _HPU_AVAILABLE,
     _IPU_AVAILABLE,
     _OMEGACONF_AVAILABLE,
@@ -42,12 +42,12 @@
 
 _HOROVOD_NCCL_AVAILABLE = False
 if _HOROVOD_AVAILABLE:
-    import horovod
+    import horovod.torch as hvd
 
     try:
 
         # `nccl_built` returns an integer
-        _HOROVOD_NCCL_AVAILABLE = bool(horovod.torch.nccl_built())
+        _HOROVOD_NCCL_AVAILABLE = bool(hvd.nccl_built())
     except AttributeError:
         # AttributeError can be raised if MPI is not available:
         # https://github.com/horovod/horovod/blob/v0.23.0/horovod/torch/__init__.py#L33-L34
@@ -77,8 +77,8 @@ def __new__(
         ipu: bool = False,
         hpu: bool = False,
         mps: Optional[bool] = None,
-        horovod: bool = False,
-        horovod_nccl: bool = False,
+        horovod: bool = False,  # TODO: remove in v1.10.0
+        horovod_nccl: bool = False,  # TODO: remove in v1.10.0
         skip_windows: bool = False,
         standalone: bool = False,
         fairscale: bool = False,

@@ -29,7 +29,7 @@
 
 from pytorch_lightning import Trainer  # noqa: E402
 from pytorch_lightning.callbacks import ModelCheckpoint  # noqa: E402
-from pytorch_lightning.utilities import _HOROVOD_AVAILABLE  # noqa: E402
+from pytorch_lightning.strategies.horovod import _HOROVOD_AVAILABLE  # noqa: E402
 
 if _HOROVOD_AVAILABLE:
     import horovod.torch as hvd
-Original file line number
+Diff line change
@@ Expand Up @@
         lite = Lite(strategy=DeepSpeedStrategy(stage=2), accelerator="gpu", devices=2)
-    Support for Horovod and Fully Sharded training strategies are coming soon.
+    Support for Fully Sharded training strategies are coming soon.
     devices
@@ Expand Down @@