Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update nccl-plugin and rxdm versions and add --install-nccl to prolog #235

Merged
merged 3 commits into from
Jan 7, 2025

Conversation

akiki-liang0
Copy link
Contributor

  • Update NCCL and RxDM image to latest recommended versions (v1.0.7, and v1.0.13)
  • add --install-nccl flag to libnccl installation command. This installs newer version of libnccl (2.21.5), which is compatible with the NCCL and RxDM images

Tests performed:

  • run llama3 training on A3-Mega with NCCL v1.0.7 and RxDM v1.0.13 (without --install-nccl flag in prolog script): results in segmentation faults
  • run llama3 training on A3-Mega with NCCL v1.0.7 and RxDM v1.0.13 (with --install-nccl flag in prolog script): results in working run

@akiki-liang0 akiki-liang0 changed the title Update nccl-plugin and rxdm versions Update nccl-plugin and rxdm versions and add --install-nccl to prolog Dec 4, 2024
@mr0re1 mr0re1 merged commit 3bfc3d5 into GoogleCloudPlatform:master Jan 7, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants