Skip to content

Commit

Permalink
Document Open-CE, update Tensorflow, PyTorch and deprecate WMLCE
Browse files Browse the repository at this point in the history
+ Adds Open-CE documentation page
  + Marks as successor to WMLCE
  + Lists the key features no longer availablle from WMLCE
  + Describes why to use Open-CE
  + provides instructions for installing Open-CE packages into conda environments
+ Updates TensorFlow page to refer to/use Open-CE not WMLCE
  + Replaces quickstart with installation via conda section
+ Updates PyToorch page to refer to/use Open-CE not WMLCE
  + Replaces quickstart with installation via conda section
+ Updates WMLCE page
  + Refer to Open-CE as successor, emphasising that WMLCE is deprecated / no longer supported
  + Update/Tweak tensorflow-benchmarks resnet50 usage+description.
+ Expands Conda documentation
  + Includes upgrading installation instructions to source the preffered etc/profile.d/conda.sh
    + https://github.com/conda/conda/blob/master/CHANGELOG.md#recommended-change-to-enable-conda-in-your-shell
  + conda python version selection should only use a single '='
+ Updates usage page emphasising ddlrun is not supported on RHEL 8

This does not include benchmarking of open-CE or RHEL 7/8 comparisons of WMLCE benchmarking due to ddlrun errors on RHEL 8.

Closes #63
Closes #72
  • Loading branch information
ptheywood committed Mar 7, 2022
1 parent 7c47c23 commit bc10fae
Show file tree
Hide file tree
Showing 7 changed files with 421 additions and 184 deletions.
44 changes: 41 additions & 3 deletions software/applications/conda.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ Conda

`Conda <https://docs.conda.io/>`__ is an open source package management system and environment management system that runs on Windows, macOS and Linux. Conda quickly installs, runs and updates packages and their dependencies.


.. _software-applications-conda-installing:

Installing Miniconda
~~~~~~~~~~~~~~~~~~~~

Expand All @@ -26,15 +29,15 @@ The simplest way to install Conda for use on Bede is through the `miniconda <htt
sha256sum Miniconda3-latest-Linux-ppc64le.sh
sh Miniconda3-latest-Linux-ppc64le.sh -b -p ./miniconda
source miniconda/bin/activate
source miniconda/etc/profile.d/conda.sh
conda update conda -y
On subsequent sessions, or in job scripts you may need to re-source miniconda. Alternatively you could add this to your bash environment. I.e.

.. code-block:: bash
export CONDADIR=/nobackup/projects/<project>/$USER # Update this with your <project> code.
source $CONDADIR/miniconda/bin/activate
source $CONDADIR/miniconda/etc/profile.d/conda.sh
Creating a new Conda Environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -45,14 +48,28 @@ I.e. to create a new conda environment named `example`, with `python 3.9` you ca

.. code-block:: bash
conda create -y --name example python==3.9
conda create -y --name example python=3.9
Once created, the environment can be activated using ``conda activate``.

.. code-block:: bash
conda activate example
Alternatively, Conda environments can be created outside of the conda/miniconda install, using the ``-p`` / ``--prefix`` option of ``conda create``.

I.e. if you have installed miniconda to your home directory, but wish to create a conda environment within the ``/project/<PROJECT>/$USER/`` directory named ``example`` you can use:

.. code-block:: bash
conda create -y --prefix /project/<PROJECT>/$USER/example python=3.9
This can subsequently be loaded via:

.. code-block:: bash
conda activate /project/<PROJECT>/$USER/example
Listing and Activating existing Conda Environments
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand All @@ -64,6 +81,27 @@ Existing conda environments can be listed via:
``conda activate`` can then be used to activate one of the listed environments.

Adding Conda Channels to an Environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The default conda channel does not contain all packages or may not contain versions of packages you may wish to use.

In this case, third-party conda channels can be added to conda environments to provide access to these packages, such as the :ref:`Open-CE <software-applications-open-ce>` Conda channel hosted by Oregon State University.

It is recommended to add channels to specific conda environments, rather than your global conda configuration.

I.e. to add the `OSU Open-CE Conda channel <https://osuosl.org/services/powerdev/opence/>`__ to the currently loaded conda environment:

.. code-block:: bash
conda config --env --prepend channels https://ftp.osuosl.org/pub/open-ce/current/
You may also wish to enable `strict channel priority <https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-channels.html#strict-channel-priority>`__ to speed up conda operations and reduce incompatibility which will be default from Conda 5.0. This may break old environment files.

.. code-block:: bash
conda config --env --set channel_priority strict
Installing Conda Packages
~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
117 changes: 117 additions & 0 deletions software/applications/open-ce.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
.. _software-applications-open-ce:

Open-CE
=======

The `Open Cognitive Environment (Open-CE) <https://osuosl.org/services/powerdev/opence/>`__ is a community driven software distribution for machine learning and deep learning frameworks.

Open-CE software is distributed via :ref:`Conda<software-applications-conda>`, with all included packages for a given Open-CE release being installable in to the same conda environment.

Open-CE conda channels suitable for use on Bede's IBM Power architecture systems are hosted by `Oregon State University <https://osuosl.org/services/powerdev/opence/>`__ and `MIT <https://opence.mit.edu/>`__.

It is the successor to :ref:`IBM WMLCE <software-applications-wmlce>` which was archived on 2020-11-10, with IBM WMLCE 1.7.0 being the final release.

Open-CE includes the following software packages, amongst others:

* :ref:`TensorFlow <software-applications-tensorflow>`
* :ref:`PyTorch <software-applications-pytorch>`
* `Horovod <https://horovod.ai/>`__
* `ONNX <https://onnx.ai/>`__

.. note::

Open-CE does not include all features from WMLCE, such as Large Model Support or Distributed Deep Learning (DDL).

Using Open-CE
-------------

Open-CE provides software packages via :ref:`Conda<software-applications-conda>`, which you must first :ref:`install<software-applications-conda-installing>`.
Conda installations of the packages provided by Open-CE can become quite large (multiple GBs), so you may wish to use a conda installation in ``/nobackup/projects/<project>`` or ``/projects/<project>`` as described in the :ref:`Installing Conda section <software-applications-conda-installing>`.

With a working Conda install, Open-CE packages can be installed from either the OSU or MIT Conda channels for PPC64LE systems such as Bede.

* OSU: ``https://ftp.osuosl.org/pub/open-ce/current/``
* MIT: ``https://opence.mit.edu/``

Using Conda Environments are recommended when working with Open-CE.

I.e. to install ``tensorflow`` and ``pytorch`` from OSU Open-CE conda channel into a conda environment named ``open-ce``:

.. code-block:: bash
# Create a new conda environment named open-ce within your conda installation
conda create -y --name open-ce python=3.9 # Older Open-CE may require older Python versions
# Activate the conda environment
conda activate open-ce
# Add the OSU Open-CE conda channel to the current environment config
conda config --env --prepend channels https://ftp.osuosl.org/pub/open-ce/current/
# Also use strict channel priority
conda config --env --set channel_priority strict
# Install the required conda package, using the channels set within the conda env. This may take some time.
conda install -y tensorflow
conda install -y pytorch
Once installed into a conda environment, the Open-CE provided software packages can be used interactively on login nodes or within batch jobs by activating the named conda environment.

.. code-block:: bash
# Activate the conda environment
conda activate open-ce
# Run a python command or script which makes use of the installed packages
# I.e. to output the version of tensorflow:
python3 -c "import tensorflow;print(tensorflow.__version__)"
# I.e. or to output the version of pytorch:
python3 -c "import torch;print(torch.__version__)"
Using older versions of Open-CE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The OSU conda distribution provides an archive of older Open-CE releases, beginning at version ``1.0.0``.

The available versions are listed at https://ftp.osuosl.org/pub/open-ce/.

Using versions other than ``current`` can be done by modifying the channel URI when adding the channel to the current conda environment with the desired version number.

I.e. to explicitly use Open-CE ``1.4.1`` the command to add the conda channel to the current environment would be:

.. code-block:: bash
conda config --env --prepend channels https://ftp.osuosl.org/pub/open-ce/1.4.1/
Using older Open-CE versions may require older python versions.
See the `OSU Open-CE page <https://osuosl.org/services/powerdev/opence/>`__ for further version information.

The MIT Open-CE channel provides multiple versions of Open-CE in the same Conda channel. If using the MIT Open-CE distribution, older versions of packages can be requested by specifying the specific version of the desired package.

Why use Open-CE
---------------

Modern machine learning packages like TensorFlow and PyTorch have large dependency trees which can conflict with one another due to the independent release schedules.
This has made it difficult to use multiple competing packages within the same environment.

Open-CE solves this issue by ensuring that packages included in a given Open-CE distribution are compatible with one another, and can be installed a the same time, simplifying the distribution of these packages.

It also provides pre-compiled distributions of these packages for PPC64LE architecture machines, which are not always available from upstream sources, reducing the time required to install these packages.

For more information on the potential benefits of using Open-CE see `this blog post from the OpenPOWER foundation <https://openpowerfoundation.org/blog/open-cognitive-environment-open-ce-a-valuable-tool-for-ai-researchers/>`__.

Differences from WMLCE
----------------------

:ref:`IBM WMLCE<software-applications-wmlce>` include several features not available in upstream TensorFlow and PyTorch distributions, such as Large Model Support.

Unfortunately, LMS is not available in TensorFlow or PyTorch provided by Open-CE.

Other features or packages absent in Open-CE which were included in WMLCE include:

* Large Model Support (LMS)
* IBM DDL
* Caffe (IMB-enhanced)
* IBM SnapML
* NVIDIA Rapids

63 changes: 38 additions & 25 deletions software/applications/pytorch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,44 +6,57 @@ PyTorch
`PyTorch <https://pytorch.org/>`__ is an end-to-end machine learning framework.
PyTorch enables fast, flexible experimentation and efficient production through a user-friendly front-end, distributed training, and ecosystem of tools and libraries.

The main method of distribution for PyTorch is via :ref:`Conda <software-applications-conda>`.
The main method of distribution for PyTorch is via :ref:`Conda <software-applications-conda>`, with :ref:`Open-CE<software-applications-open-ce>` providing a simple method for installing multiple machine learning frameworks into a single conda environment.

For more information on the usage of PyTorch, see the `Online Documentation <https://pytorch.org/docs/>`__.
The upstream Conda and pip distributions do not provide ppc64le pytorch packages at this time.

PyTorch Quickstart
~~~~~~~~~~~~~~~~~~
Installing via Conda
~~~~~~~~~~~~~~~~~~~~

With a working Conda installation (see :ref:`Installing Miniconda<software-applications-conda-installing>`) the following instructions can be used to create a Python 3.9 conda environment named ``torch`` with the latest Open-CE provided PyTorch:

.. note::

Pytorch installations via conda can be relatively large. Consider installing your miniconda (and therfore your conda environments) to the ``/nobackup`` file store.

The following should get you set up with a working conda environment (replacing <project> with your project code):

.. code-block:: bash
export DIR=/nobackup/projects/<project>/$USER
# rm -rf ~/.conda ~/.condarc $DIR/miniconda # Uncomment if you want to remove old env
mkdir $DIR
pushd $DIR
# Create a new conda environment named torch-env within your conda installation
conda create -y --name torch-env python=3.8
# Activate the conda environment
conda activate torch-env
# Add the OSU Open-CE conda channel to the current environment config
conda config --env --prepend channels https://ftp.osuosl.org/pub/open-ce/current/
# Download the latest miniconda installer for ppcle64
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-ppc64le.sh
# Validate the file checksum matches is listed on https://docs.conda.io/en/latest/miniconda_hashes.html.
sha256sum Miniconda3-latest-Linux-ppc64le.sh
# Also use strict channel priority
conda config --env --set channel_priority strict
# Install the latest available version of PyTorch
conda install -y pytorch
In subsequent interactive sessions, and when submitting batch jobs which use PyTorch, you will then need to re-activate the conda environment.

For example, to verify that PyTorch is available and print the version:

.. code-block:: bash
sh Miniconda3-latest-Linux-ppc64le.sh -b -p $DIR/miniconda
source miniconda/bin/activate
conda update conda -y
conda config --set channel_priority strict
# Activate the conda environment
conda activate torch-env
conda config --prepend channels \
https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
# Invoke python
python3 -c "import torch;print(torch.__version__)"
conda config --prepend channels \
https://opence.mit.edu
conda create --name opence pytorch=1.7.1 -y
conda activate opence
Installation via the upstream Conda channel is not currently possible, due to the lack of ``ppc64le`` or ``noarch`` distributions.


This has some limitations such as not supporting large model support.
If you require LMS, please see the :ref:`WMLCE <software-applications-wmlce>` page.
.. note::

The :ref:`Open-CE<software-applications-open-ce>` distribution of PyTorch does not include IBM technologies such as DDL or LMS, which were previously available via :ref:`WMLCE<software-applications-wmlce>`.
WMLCE is not supported on RHEL 8.


Further Information
Expand Down
61 changes: 38 additions & 23 deletions software/applications/tensorflow.rst
Original file line number Diff line number Diff line change
@@ -1,43 +1,58 @@
.. _software-python-tensorflow:
.. _software-applications-tensorflow:

TensorFlow
----------

`TensorFlow <https://www.tensorflow.org/>`__ is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.

TensorFlow Quickstart
~~~~~~~~~~~~~~~~~~~~~
TensorFlow can be installed through a number of python package managers such as :ref:`Conda<software-applications-conda>` or ``pip``.

For use on Bede, the simplest method is to install TensorFlow using the :ref:`Open-CE Conda distribution<software-applications-open-ce>`.


Installing via Conda (Open-CE)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

With a working Conda installation (see :ref:`Installing Miniconda<software-applications-conda-installing>`) the following instructions can be used to create a Python 3.8 conda environment named ``tf-env`` with the latest Open-CE provided TensorFlow:

.. note::

TensorFlow installations via conda can be relatively large. Consider installing your miniconda (and therfore your conda environments) to the ``/nobackup`` file store.

The following should get you set up with a working conda environment (replacing ``<project>`` with your project code):

.. code-block:: bash
export DIR=/nobackup/projects/<project>/$USER
# rm -rf ~/.conda ~/.condarc $DIR/miniconda # Uncomment if you want to remove old env
mkdir $DIR
pushd $DIR
# Create a new conda environment named tf-env within your conda installation
conda create -y --name tf-env python=3.8
# Download the latest miniconda installer for ppcle64
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-ppc64le.sh
# Validate the file checksum matches is listed on https://docs.conda.io/en/latest/miniconda_hashes.html.
sha256sum Miniconda3-latest-Linux-ppc64le.sh
# Activate the conda environment
conda activate tf-env
sh Miniconda3-latest-Linux-ppc64le.sh -b -p $DIR/miniconda
source miniconda/bin/activate
conda update conda -y
# Add the OSU Open-CE conda channel to the current environment config
conda config --env --prepend channels https://ftp.osuosl.org/pub/open-ce/current/
conda config --prepend channels \
https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/
# Also use strict channel priority
conda config --env --set channel_priority strict
# Install the latest available version of Tensorflow
conda install -y tensorflow
In subsequent interactive sessions, and when submitting batch jobs which use TensorFlow, you will then need to re-activate the conda environment.

For example, to verify that TensorFlow is available and print the version:

.. code-block:: bash
conda config --prepend channels \
https://opence.mit.edu
# Activate the conda environment
conda activate tf-env
conda create --name opence tensorflow -y
conda activate opence
# Invoke python
python3 -c "import tensorflow;print(tensorflow.__version__)"
.. note::

This conflicts with the :ref:`PyTorch <software-applications-pytorch>` instructions as they set the conda channel_priority to be strict which seems to cause issues when installing TensorFlow.

The :ref:`Open-CE<software-applications-open-ce>` distribution of TensorFlow does not include IBM technologies such as DDL or LMS, which were previously available via :ref:`WMLCE<software-applications-wmlce>`.
WMLCE is not supported on RHEL 8.

Further Information
~~~~~~~~~~~~~~~~~~~
Expand Down
Loading

0 comments on commit bc10fae

Please sign in to comment.