Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ml-slurm examples to use recent copies of pytorch and tensorflow #3226

Merged
merged 3 commits into from
Nov 6, 2024

Conversation

tpdownes
Copy link
Member

@tpdownes tpdownes commented Nov 6, 2024

Adopt recent versions of pytorch and tensorflow from pip which have improved predictability of CUDA adoption.

Example outputs

Running example on login node (CPU)

ext_tpdownes_google_com@mlexamplev-slurm-login-001:~$ conda activate pytorch
(pytorch) ext_tpdownes_google_com@mlexamplev-slurm-login-001:~$ python3 torch_test.py
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Using device: cpu
<torch.utils.benchmark.utils.common.Measurement object at 0x7f6df96daa90>
batched_dot_mul_sum(x, x)
setup: from __main__ import batched_dot_mul_sum
  350.64 us
  1 measurement, 100 runs , 1 thread
<torch.utils.benchmark.utils.common.Measurement object at 0x7f6dfadd88d0>
batched_dot_bmm(x, x)
setup: from __main__ import batched_dot_bmm
  659.48 us

Running example on g2 node

Using device: cuda
NVIDIA L4
<torch.utils.benchmark.utils.common.Measurement object at 0x7fe93b7f5310>
batched_dot_mul_sum(x, x)
setup: from __main__ import batched_dot_mul_sum
  420.50 us
  1 measurement, 100 runs , 1 thread
<torch.utils.benchmark.utils.common.Measurement object at 0x7fe93bdc2990>
batched_dot_bmm(x, x)
setup: from __main__ import batched_dot_bmm
  863.33 us
  1 measurement, 100 runs , 1 thread

Running example on a2 node

Using device: cuda
NVIDIA A100-SXM4-40GB
<torch.utils.benchmark.utils.common.Measurement object at 0x7f18773cbd50>
batched_dot_mul_sum(x, x)
setup: from __main__ import batched_dot_mul_sum
  419.16 us
  1 measurement, 100 runs , 1 thread
<torch.utils.benchmark.utils.common.Measurement object at 0x7f1877b00550>
batched_dot_bmm(x, x)
setup: from __main__ import batched_dot_bmm
  866.42 us
  1 measurement, 100 runs , 1 thread

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

Adopt recent versions of pytorch and tensorflow from pip which have
improved predictability of CUDA adoption.
Adopt recent versions of pytorch and tensorflow from pip which have
improved predictability of CUDA adoption.
@tpdownes tpdownes added the release-version-updates Added to release notes under the "Version Updates" heading. label Nov 6, 2024
@harshthakkar01
Copy link
Contributor

g2g after test passes.

@tpdownes tpdownes merged commit c06fa10 into GoogleCloudPlatform:develop Nov 6, 2024
11 of 57 checks passed
@tpdownes tpdownes deleted the fix_ml_slurm branch November 6, 2024 19:44
@rohitramu rohitramu mentioned this pull request Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-version-updates Added to release notes under the "Version Updates" heading.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants