Skip to content

Latest commit

 

History

History
31 lines (24 loc) · 3.04 KB

README.md

File metadata and controls

31 lines (24 loc) · 3.04 KB

Distributed deep learning

Examples of how to distribute deep learning on a High Performance Computer (HPC).

Contents

These examples use Ray Train in a static job on a HPC. Ray handles most of the complexity of distributing the work, with minimal changes to your TensorFlow or PyTorch code.

First, install the Python environments for the required HPC: install_python_environments.md.

It's preferable to use a static job on the HPC. To do this, you could test out different ideas locally in a Jupyter Notebook, then when ready convert this to an executable script (.py) and move it over. However, it is also possible to use Jupyter Notebooks interactively on the HPC following the instructions here: jupyter_notebook_to_hpc.md.