-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resnet50 benchmark scripts no longer usable #72
Comments
14 tasks
ptheywood
added a commit
that referenced
this issue
Mar 7, 2022
+ Adds Open-CE documentation page + Marks as successor to WMLCE + Lists the key features no longer availablle from WMLCE + Describes why to use Open-CE + provides instructions for installing Open-CE packages into conda environments + Updates TensorFlow page to refer to/use Open-CE not WMLCE + Replaces quickstart with installation via conda section + Updates PyToorch page to refer to/use Open-CE not WMLCE + Replaces quickstart with installation via conda section + Updates WMLCE page + Refer to Open-CE as successor, emphasising that WMLCE is deprecated / no longer supported + Update/Tweak tensorflow-benchmarks resnet50 usage+description. + Expands Conda documentation + Includes upgrading installation instructions to source the preffered etc/profile.d/conda.sh + https://github.com/conda/conda/blob/master/CHANGELOG.md#recommended-change-to-enable-conda-in-your-shell + conda python version selection should only use a single '=' + Updates usage page emphasising ddlrun is not supported on RHEL 8 This does not include benchmarking of open-CE or RHEL 7/8 comparisons of WMLCE benchmarking due to ddlrun errors on RHEL 8. Closes #63 Closes #72
Not adding Resenet benchmark results to the WMLCE / OpenCE documentation:
It may be nice to add some general DL benchmarking to compare against x86+V100 systems to support encouraging usesrs onto Bede, but that can become a future issue rather than blocking WMLCE/OpenCE clarrification. |
ptheywood
added a commit
that referenced
this issue
Mar 7, 2022
+ Adds Open-CE documentation page + Marks as successor to WMLCE + Lists the key features no longer availablle from WMLCE + Describes why to use Open-CE + provides instructions for installing Open-CE packages into conda environments + Updates TensorFlow page to refer to/use Open-CE not WMLCE + Replaces quickstart with installation via conda section + Updates PyToorch page to refer to/use Open-CE not WMLCE + Replaces quickstart with installation via conda section + Updates WMLCE page + Refer to Open-CE as successor, emphasising that WMLCE is deprecated / no longer supported + Update/Tweak tensorflow-benchmarks resnet50 usage+description. + Expands Conda documentation + Includes upgrading installation instructions to source the preffered etc/profile.d/conda.sh + https://github.com/conda/conda/blob/master/CHANGELOG.md#recommended-change-to-enable-conda-in-your-shell + conda python version selection should only use a single '=' + Updates usage page emphasising ddlrun is not supported on RHEL 8 This does not include benchmarking of open-CE or RHEL 7/8 comparisons of WMLCE benchmarking due to ddlrun errors on RHEL 8. Closes #63 Closes #72
Repository owner
moved this from In Progress
to Done
in Documentation
Apr 19, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The Resenet50 benchmark job scripts are no longer usable on bede, as
/opt/software/apps/anaconda3
does not exist.Additionally, moving to RHEL8 where WMLCE is not supported (instead replaced by OpenCE) it is unclear if
ddlrun
and thereforebede-ddlrun
will be usable.It may be worth re-benchmarking RESNET50 prior to the RHEL8 switch so we know the performance impact of WMLCE vs OpenCE?
I.e. run RESNET50 at a number of scales (1, 2, 4, 8, 12?, 16? GPUs, current docs say no need to go larger) with:
The current RHEL8 testing partition only conatisn 2 nodes, so only up to 8 GPUs will currently be usable for RHEL8.
This is closely related to #63
The text was updated successfully, but these errors were encountered: