Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resnet50 benchmark scripts no longer usable #72

Closed
ptheywood opened this issue Sep 21, 2021 · 1 comment · Fixed by #102
Closed

Resnet50 benchmark scripts no longer usable #72

ptheywood opened this issue Sep 21, 2021 · 1 comment · Fixed by #102

Comments

@ptheywood
Copy link
Member

ptheywood commented Sep 21, 2021

The Resenet50 benchmark job scripts are no longer usable on bede, as /opt/software/apps/anaconda3 does not exist.

Additionally, moving to RHEL8 where WMLCE is not supported (instead replaced by OpenCE) it is unclear if ddlrun and therefore bede-ddlrun will be usable.

It may be worth re-benchmarking RESNET50 prior to the RHEL8 switch so we know the performance impact of WMLCE vs OpenCE?

I.e. run RESNET50 at a number of scales (1, 2, 4, 8, 12?, 16? GPUs, current docs say no need to go larger) with:

  • RHEL7, WMLCE
  • RHEL 7, OpenCE, ideally matching WMLCE TF version
  • RHEL 7, OpenCE lastest
  • RHEL 8, OpenCE older
  • RHEL 8, OpenCE latest?

The current RHEL8 testing partition only conatisn 2 nodes, so only up to 8 GPUs will currently be usable for RHEL8.

This is closely related to #63

ptheywood added a commit that referenced this issue Mar 7, 2022
+ Adds Open-CE documentation page
  + Marks as successor to WMLCE
  + Lists the key features no longer availablle from WMLCE
  + Describes why to use Open-CE
  + provides instructions for installing Open-CE packages into conda environments
+ Updates TensorFlow page to refer to/use Open-CE not WMLCE
  + Replaces quickstart with installation via conda section
+ Updates PyToorch page to refer to/use Open-CE not WMLCE
  + Replaces quickstart with installation via conda section
+ Updates WMLCE page
  + Refer to Open-CE as successor, emphasising that WMLCE is deprecated / no longer supported
  + Update/Tweak tensorflow-benchmarks resnet50 usage+description.
+ Expands Conda documentation
  + Includes upgrading installation instructions to source the preffered etc/profile.d/conda.sh
    + https://github.com/conda/conda/blob/master/CHANGELOG.md#recommended-change-to-enable-conda-in-your-shell
  + conda python version selection should only use a single '='
+ Updates usage page emphasising ddlrun is not supported on RHEL 8

This does not include benchmarking of open-CE or RHEL 7/8 comparisons of WMLCE benchmarking due to ddlrun errors on RHEL 8.

Closes #63
Closes #72
@ptheywood
Copy link
Member Author

Not adding Resenet benchmark results to the WMLCE / OpenCE documentation:

  • ddlrun errors on RHEL 8 instalaltions of WMLCE, so the benchmark is not useful for comparing RHEL 7 / RHEL 8 tensorflow performance
  • WMLCE tensorflow-benchmarks has an IBM licence that does not look like allows redistribution outside of WMLCE, so not useful for direct porting to Open-CE based multi-node tensorflow. It would be better to find more open benchmarking if wishing to compare, but users have no option but to migrate for security purposes anyway.

It may be nice to add some general DL benchmarking to compare against x86+V100 systems to support encouraging usesrs onto Bede, but that can become a future issue rather than blocking WMLCE/OpenCE clarrification.

ptheywood added a commit that referenced this issue Mar 7, 2022
+ Adds Open-CE documentation page
  + Marks as successor to WMLCE
  + Lists the key features no longer availablle from WMLCE
  + Describes why to use Open-CE
  + provides instructions for installing Open-CE packages into conda environments
+ Updates TensorFlow page to refer to/use Open-CE not WMLCE
  + Replaces quickstart with installation via conda section
+ Updates PyToorch page to refer to/use Open-CE not WMLCE
  + Replaces quickstart with installation via conda section
+ Updates WMLCE page
  + Refer to Open-CE as successor, emphasising that WMLCE is deprecated / no longer supported
  + Update/Tweak tensorflow-benchmarks resnet50 usage+description.
+ Expands Conda documentation
  + Includes upgrading installation instructions to source the preffered etc/profile.d/conda.sh
    + https://github.com/conda/conda/blob/master/CHANGELOG.md#recommended-change-to-enable-conda-in-your-shell
  + conda python version selection should only use a single '='
+ Updates usage page emphasising ddlrun is not supported on RHEL 8

This does not include benchmarking of open-CE or RHEL 7/8 comparisons of WMLCE benchmarking due to ddlrun errors on RHEL 8.

Closes #63
Closes #72
@ptheywood ptheywood moved this to New in Documentation Mar 24, 2022
@ptheywood ptheywood moved this from New to In Progress in Documentation Mar 24, 2022
Repository owner moved this from In Progress to Done in Documentation Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant