Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Reuse the rotation script inside the munge resource #2471

Merged
merged 17 commits into from
Oct 12, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,29 @@
# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and
# limitations under the License.

# Copy pcluster config generator and templates
remote_directory "#{node['cluster']['scripts_dir']}/slurm" do
source 'head_node_slurm/slurm'
mode '0755'
action :create
recursive true
end
jdeamicis marked this conversation as resolved.
Show resolved Hide resolved

template "#{node['cluster']['scripts_dir']}/slurm/update_munge_key.sh" do
source 'slurm/head_node/update_munge_key.sh.erb'
owner 'root'
group 'root'
mode '0700'
variables(
munge_key_secret_arn: lazy { node['cluster']['config'].dig(:DevSettings, :MungeKeySettings, :MungeKeySecretArn) },
region: node['cluster']['region'],
munge_user: node['cluster']['munge']['user'],
munge_group: node['cluster']['munge']['group'],
shared_directory_compute: node['cluster']['shared_dir'],
shared_directory_login: node['cluster']['shared_dir_login']
)
end

include_recipe 'aws-parallelcluster-slurm::config_munge_key'

# Export /opt/slurm
Expand Down Expand Up @@ -59,14 +82,6 @@
mode '0644'
end

# Copy pcluster config generator and templates
remote_directory "#{node['cluster']['scripts_dir']}/slurm" do
source 'head_node_slurm/slurm'
mode '0755'
action :create
recursive true
end

unless on_docker?
# Generate pcluster specific configs
no_gpu = nvidia_installed? ? "" : "--no-gpu"
Expand Down Expand Up @@ -281,18 +296,3 @@
retries 5
retry_delay 2
end unless redhat_on_docker?

template "#{node['cluster']['scripts_dir']}/slurm/update_munge_key.sh" do
source 'slurm/head_node/update_munge_key.sh.erb'
owner 'root'
group 'root'
mode '0700'
variables(
munge_key_secret_arn: lazy { node['cluster']['config'].dig(:DevSettings, :MungeKeySettings, :MungeKeySecretArn) },
region: node['cluster']['region'],
munge_user: node['cluster']['munge']['user'],
munge_group: node['cluster']['munge']['group'],
shared_directory_compute: node['cluster']['shared_dir'],
shared_directory_login: node['cluster']['shared_dir_login']
)
end
37 changes: 8 additions & 29 deletions cookbooks/aws-parallelcluster-slurm/resources/munge_key_manager.rb
Original file line number Diff line number Diff line change
Expand Up @@ -23,35 +23,14 @@

default_action :setup_munge_key

def fetch_and_decode_munge_key(munge_key_secret_arn)
declare_resource(:bash, 'fetch_and_decode_munge_key') do
def fetch_and_decode_munge_key
script_path = "#{node['cluster']['scripts_dir']}/slurm/update_munge_key.sh"

declare_resource(:execute, 'fetch_and_decode_munge_key') do
jdeamicis marked this conversation as resolved.
Show resolved Hide resolved
user 'root'
group 'root'
cwd '/tmp'
code <<-FETCH_AND_DECODE
set -e
# Get encoded munge key from secrets manager
encoded_key=$(aws secretsmanager get-secret-value --secret-id #{munge_key_secret_arn} --query 'SecretString' --output text --region #{node['cluster']['region']})
# If encoded_key doesn't have a value, error and exit
if [ -z "$encoded_key" ]; then
echo "Error fetching munge key from Secrets Manager or the key is empty"
exit 1
fi

# Decode munge key and write to /etc/munge/munge.key
decoded_key=$(echo $encoded_key | base64 -d)
if [ $? -ne 0 ]; then
echo "Error decoding the munge key with base64"
exit 1
fi

echo "$decoded_key" > /etc/munge/munge.key

# Set ownership on the key
chown #{node['cluster']['munge']['user']}:#{node['cluster']['munge']['group']} /etc/munge/munge.key
# Enforce correct permission on the key
chmod 0600 /etc/munge/munge.key
FETCH_AND_DECODE
cwd ::File.dirname(script_path)
command "./#{::File.basename(script_path)} -c True"
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can simplify this block:

def fetch_and_decode_munge_key
  declare_resource(:execute, 'fetch_and_decode_munge_key') do
    user 'root'
    group 'root'
    command "/#{node['cluster']['scripts_dir']}/slurm/update_munge_key.sh -d"
  end
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comment! Thank you Jacopo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we actually need the first / character in the command variable. Though, in principle scripts_dir will always be an absolute path, so it shouldn't hurt.


jdeamicis marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -71,7 +50,7 @@ def generate_munge_key
action :setup_munge_key do
if new_resource.munge_key_secret_arn
# This block will fetch the munge key from Secrets Manager
fetch_and_decode_munge_key(new_resource.munge_key_secret_arn)
fetch_and_decode_munge_key
else
# This block will randomly generate a munge key
generate_munge_key
Expand All @@ -81,7 +60,7 @@ def generate_munge_key
action :update_munge_key do
if new_resource.munge_key_secret_arn
# This block will fetch the munge key from Secrets Manager and replace the previous munge key
fetch_and_decode_munge_key(new_resource.munge_key_secret_arn)
fetch_and_decode_munge_key
else
# This block will randomly generate a munge key and replace the previous munge key
generate_munge_key
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,31 @@ MUNGE_USER="<%= @munge_user %>"
MUNGE_GROUP="<%= @munge_group %>"
SHARED_DIRECTORY_COMPUTE="<%= @shared_directory_compute %>"
SHARED_DIRECTORY_LOGIN="<%= @shared_directory_login %>"
CONFIG_PROCESS="False"
jdeamicis marked this conversation as resolved.
Show resolved Hide resolved

while getopts "c:" opt; do
case $opt in
c)
if [[ $OPTARG != "True" && $OPTARG != "False" ]]; then
echo "Invalid value for -c: $OPTARG. Expected 'True' or 'False'."
exit 1
fi
CONFIG_PROCESS=$OPTARG
;;
*)
echo "Usage: $0 [-c True/False]" >&2
exit 1
;;
esac
done
jdeamicis marked this conversation as resolved.
Show resolved Hide resolved

# Check compute fleet status
compute_fleet_status=$(get-compute-fleet-status.sh)
if ! echo "$compute_fleet_status" | grep -q '"status": "STOPPED"'; then
echo "Compute fleet is not stopped. Please stop it before updating the munge key."
exit 1
if [ "$CONFIG_PROCESS" == "False" ]; then
# Check compute fleet status
compute_fleet_status=$(get-compute-fleet-status.sh)
if ! echo "$compute_fleet_status" | grep -q '"status": "STOPPED"'; then
echo "Compute fleet is not stopped. Please stop it before updating the munge key."
exit 1
fi
jdeamicis marked this conversation as resolved.
Show resolved Hide resolved
fi

# If SECRET_ARN is provided, fetch the munge key from Secrets Manager
Expand Down Expand Up @@ -59,33 +78,35 @@ else
exit 1
fi

# Enable and restart munge service
systemctl enable munge
echo "Restarting munge service"
systemctl restart munge
if [ "$CONFIG_PROCESS" == "False" ]; then
jdeamicis marked this conversation as resolved.
Show resolved Hide resolved
# Enable and restart munge service
systemctl enable munge
echo "Restarting munge service"
systemctl restart munge

# Wait for a short period
sleep 5
# Wait for a short period
sleep 5

# Check if munge service is running
if systemctl --quiet is-active munge; then
echo "Munge service is active"
else
echo "Failed to restart munge service"
exit 1
fi
# Check if munge service is running
if systemctl --quiet is-active munge; then
echo "Munge service is active"
else
echo "Failed to restart munge service"
exit 1
fi

# Share munge key
SHARED_DIRECTORIES=(${SHARED_DIRECTORY_COMPUTE} ${SHARED_DIRECTORY_LOGIN})
# Share munge key
SHARED_DIRECTORIES=(${SHARED_DIRECTORY_COMPUTE} ${SHARED_DIRECTORY_LOGIN})

for dir in "${SHARED_DIRECTORIES[@]}"; do
echo "Sharing munge key to $dir"
mkdir -p "$dir/.munge"
cp /etc/munge/munge.key "$dir/.munge/.munge.key"
chmod 0700 "$dir/.munge"
chmod 0600 "$dir/.munge/.munge.key"
done
for dir in "${SHARED_DIRECTORIES[@]}"; do
echo "Sharing munge key to $dir"
mkdir -p "$dir/.munge"
cp /etc/munge/munge.key "$dir/.munge/.munge.key"
chmod 0700 "$dir/.munge"
chmod 0600 "$dir/.munge/.munge.key"
done

echo "Shared munge key"
echo "Shared munge key"
fi

exit 0