Ensure local SSD filesystem is assembled into a RAID even upon power off/on cycles #3129
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The solution in #2720 has a known shortcoming for the case of powering off and powering on a node within a Slurm cluster without retaining the contents of the local SSD disks. In this case, the persistent disk is retained and Slurm startup scripts do not re-run the Ansible playbook that idempotently creates and assembles the RAID before formatting it.
This PR closes that gap by replacing the relevant Ansible tasks with a SystemD unit that performs the same function idempotently. It is guaranteed to run after local filesystems are mounted and does not act if the local SSD volume was successfully mounted. It is also guaranteed to complete execution before Slurmd starts. If it fails, it does not block slurmd.service execution. This is an intentional choice as I believe we should explore general designs for blocking Slurmd upon failure to mount filesystems required for workflows.
Testing
Tests were performed on the
community/examples/hpc-slurm-local-ssd.yaml
example which uses Rocky Linux 8. This should be a worst case scenario as part of the implementation relies upon the SystemD directiveConditionPathIsMountPoint
which was not introduced until SystemD 244. It appears to have been backported to SystemD 239 in Rocky Linux 8. Manual inspection of other Linux distribution shows that all current distributions use SystemD 244 or above.Additional reboot and power off/on testing performed with this blueprint. Summary:
For Rocky Linux 9, our Ansible installer affirmatively quits upon seeing release 9:
For Ubuntu 24.04, the problem traces to the new default of Python 3.12. See, e.g., https://stackoverflow.com/a/78464477:
The Ansible installer is in need of modernization now that CentOS is EOL. I will ensure the documentation on this is up to date but it's unrelated to to this PR.
Reboot (local SSD contents retained)
In this scenario, we see the local SSD service block execution because the mountpoint is already mounted
Power off, Power on (discards local SSD contents, so requires a reformat)
In this scenario, we see the local SSD service execute and succeed before slurmd.service starts.
POWER OFF/ON
Submission Checklist
NOTE: Community submissions can take up to 2 weeks to be reviewed.
Please take the following actions before submitting this pull request.