Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update hpc slurm daos example and its references to use Slurm V6 #2144

Conversation

harshthakkar01
Copy link
Contributor

Submission Checklist

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cloud HPC Toolkit Contribution guidelines #

@harshthakkar01 harshthakkar01 added the release-improvements Added to release notes under the "Improvements" heading. label Jan 17, 2024
@mark-olson
Copy link
Contributor

@harshthakkar01 I was going to submit a PR today that updates this blueprint.

See https://github.com/mark-olson/hpc-toolkit/blob/DAOSGCP-182/community/examples/intel/hpc-slurm-daos.yaml

It also uses the DAOS modules in google-cloud-daos v0.5.0 which install DAOS v2.4.

The problem I'm having today is that when I deploy the blueprint, any dnf commands on the slurm instances are failing because there is no lustre el8.8 repo at https://downloads.whamcloud.com/public/lustre/latest-release/.

I see that there was an update in https://github.com/SchedMD/slurm-gcp/releases/tag/6.3.1 to fix that but the slurm v6 modules in the toolkit seem to be using slurm-6-1.

Maybe @mr0re1 can help with that?

Due to the errors I'm encountering I'm not able to submit my PR to update this blueprint.

@mr0re1
Copy link
Collaborator

mr0re1 commented Jan 18, 2024

Hi @mark-olson , we are planning to update to SlurmGCP 6.3.1 in next few days. I will ping this PR comments once it's done.

@mark-olson
Copy link
Contributor

mark-olson commented Jan 19, 2024

@harshthakkar01

I submitted PR#2147 which includes the slurm v6 changes in this PR as well as a version bump to google-cloud-daos v0.5.0 which installs the current version of DAOS (v2.4).

As a result, this PR does not need to be merged.

@harshthakkar01 harshthakkar01 self-assigned this Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-improvements Added to release notes under the "Improvements" heading.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants