Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Add custom munge key rotation script #2453

Merged
merged 4 commits into from
Sep 18, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -281,3 +281,17 @@
retries 5
retry_delay 2
end unless redhat_on_docker?

template "#{node['cluster']['scripts_dir']}/slurm/update_munge_key.sh" do
source 'slurm/head_node/update_munge_key.sh.erb'
owner 'root'
group 'root'
mode '0700'
variables(
munge_key_secret_arn: lazy { node['cluster']['config'].dig(:DevSettings, :SlurmSettings, :MungeKeySecretArn) },
region: node['cluster']['region'],
munge_user: node['cluster']['munge']['user'],
munge_group: node['cluster']['munge']['group'],
cluster_user: node['cluster']['cluster_user']
)
end
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,22 @@ def update_nodes_in_queue(strategy, queues)
only_if { ::File.exist?(node['cluster']['previous_cluster_config_path']) && is_slurm_database_updated? }
end unless on_docker?

# Update rotation script to update secret arn
template "#{node['cluster']['scripts_dir']}/slurm/update_munge_key.sh" do
source 'slurm/head_node/update_munge_key.sh.erb'
owner 'root'
group 'root'
mode '0700'
variables(
munge_key_secret_arn: lazy { node['cluster']['config'].dig(:DevSettings, :SlurmSettings, :MungeKeySecretArn) },
region: node['cluster']['region'],
munge_user: node['cluster']['munge']['user'],
munge_group: node['cluster']['munge']['group'],
cluster_user: node['cluster']['cluster_user']
)
only_if { ::File.exist?(node['cluster']['previous_cluster_config_path']) && is_custom_munge_key_updated? }
end

# The previous execute "generate_pcluster_slurm_configs" block resource may have overridden the slurmdbd password in
# slurm_parallelcluster_slurmdbd.conf with a default value, so if it has run and Slurm accounting
# is enabled we must pull the database password from Secrets Manager once again.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#!/bin/bash
# This script updates the munge key used in the system.
# It fetches the key from AWS Secrets Manager or generates one if it doesn't exist.
# The script does not require any argument.
#
# Usage: ./update_munge_key.sh
# #

set -e

MUNGE_KEY_FILE="/etc/munge/munge.key"
SECRET_ARN="<%= @munge_key_secret_arn %>"
REGION="<%= @region %>"
MUNGE_USER="<%= @munge_user %>"
MUNGE_GROUP="<%= @munge_group %>"
CLUSTER_USER="<%= @cluster_user %>"

# If SECRET_ARN is provided, fetch the munge key from Secrets Manager
if [ -n "${SECRET_ARN}" ]; then
echo "Fetching munge key from AWS Secrets Manager: ${SECRET_ARN}"
encoded_key=$(aws secretsmanager get-secret-value --secret-id ${SECRET_ARN} --query 'SecretString' --output text --region ${REGION})

if [ -z "${encoded_key}" ]; then
echo "Error fetching munge key from Secrets Manager or the key is empty"
exit 1
fi

# Decode munge key and write to munge.key file
decoded_key=$(echo $encoded_key | base64 -d)
if [ $? -ne 0 ]; then
echo "Error decoding the munge key with base64"
exit 1
fi

# Remove current munge key if exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment in the update PR: 2452/files#r1325705545

I think we can simply override the existing key without explicitly removing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have safely removing these codes.

if [ -f "${MUNGE_KEY_FILE}" ]; then
rm -f ${MUNGE_KEY_FILE}
fi

echo "${decoded_key}" > ${MUNGE_KEY_FILE}

# Set ownership on the key
chown ${MUNGE_USER}:${MUNGE_GROUP} ${MUNGE_KEY_FILE}
# Enforce correct permission on the key
chmod 0600 ${MUNGE_KEY_FILE}

else
echo "MUNGE KEY SECRET ARN isn't provided"
exit 1
fi

# Enable and restart munge service
systemctl enable munge
echo "Start to Restart munge service"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: echo "Restarting munge service"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

systemctl restart munge || { sleep 10; systemctl restart munge; } || { sleep 10; systemctl restart munge; } || { sleep 10; systemctl restart munge; } || { sleep 10; systemctl restart munge; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for the multiple sleep commands? Is it to wait for the service to restart? I think systemctl restart is a synchronous operation (unless otherwise specified: --no-block).

Did you notice asynchronous behaviour while running the command?

Also after restarting the munge service we can check if it's running systemctl --quiet is-active munge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def enable_munge_service
  service "munge" do
    supports restart: true
    action %i(enable start)
    retries 5
    retry_delay 10
  end
end

I add the retry codes because I mentioned the enable_munge_service contained retry steps. So I added it in case there is something I don't really understand in the munge service.

But yes, I think we can remove it.

Also I added the checking commands.

echo "Restart munge service completed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: echo "Restarted munge service"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also done for echo "Sharing munge key" and echo "Shared munge key"


# Share munge key
echo "Start to Share munge key"
mkdir -p /home/${CLUSTER_USER}/.munge
cp /etc/munge/munge.key /home/${CLUSTER_USER}/.munge/.munge.key
echo "Share munge key completed"

exit 0