Skip to content

Commit

Permalink
Merge pull request GoogleCloudPlatform#3255 from mr0re1/health_out
Browse files Browse the repository at this point in the history
Comment out pluging of `gpu-test`
  • Loading branch information
mr0re1 authored Nov 15, 2024
2 parents f62f2bc + 684c035 commit c6ee4b1
Showing 1 changed file with 2 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,8 @@ deployment_groups:
chmod 0755 "${SLURM_ROOT}/scripts/rxdm"
ln -s "${SLURM_ROOT}/scripts/rxdm" "${SLURM_ROOT}/partition-$(vars.a3mega_partition_name)-prolog_slurmd.d/rxdm.prolog_slurmd"
ln -s "${SLURM_ROOT}/scripts/rxdm" "${SLURM_ROOT}/partition-$(vars.a3mega_partition_name)-epilog_slurmd.d/rxdm.epilog_slurmd"
ln -s "/slurm/scripts/tools/gpu-test" "${SLURM_ROOT}/partition-$(vars.a3mega_partition_name)-epilog_slurmd.d/gpu-test.epilog_slurmd"
# Uncomment the line below to enable epilog that will check health of GPUs and drain node if problem is detected.
# ln -s "/slurm/scripts/tools/gpu-test" "${SLURM_ROOT}/partition-$(vars.a3mega_partition_name)-epilog_slurmd.d/gpu-test.epilog_slurmd"
- type: shell
destination: reset_enroot.sh
content: |
Expand Down

0 comments on commit c6ee4b1

Please sign in to comment.