Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Share the munge key from the Parallel Cluster shared folder #2467

Merged
merged 9 commits into from
Sep 26, 2023
35 changes: 24 additions & 11 deletions cookbooks/aws-parallelcluster-slurm/libraries/helpers.rb
Original file line number Diff line number Diff line change
Expand Up @@ -97,38 +97,51 @@ def update_munge_head_node
share_munge_head_node
end

def share_munge_head_node
# Share munge key
def share_munge_key_to_dir(shared_dir)
bash 'share_munge_key' do
user 'root'
group 'root'
code <<-HEAD_SHARE_MUNGE_KEY
code <<-SHARE_MUNGE_KEY
set -e
mkdir -p /home/#{node['cluster']['cluster_user']}/.munge
mkdir -p #{shared_dir}/.munge
# Copy key to shared dir
cp /etc/munge/munge.key /home/#{node['cluster']['cluster_user']}/.munge/.munge.key
HEAD_SHARE_MUNGE_KEY
cp /etc/munge/munge.key #{shared_dir}/.munge/.munge.key
chmod 0600 #{shared_dir}/.munge
jdeamicis marked this conversation as resolved.
Show resolved Hide resolved
chmod 0600 #{shared_dir}/.munge/.munge.key
SHARE_MUNGE_KEY
end
end

def setup_munge_compute_node
# Get munge key
def share_munge_head_node
share_munge_key_to_dir(node['cluster']['shared_dir'])
share_munge_key_to_dir(node['cluster']['shared_dir_login'])
end

def setup_munge_key(shared_dir)
bash 'get_munge_key' do
user 'root'
group 'root'
code <<-COMPUTE_MUNGE_KEY
code <<-MUNGE_KEY
set -e
# Copy munge key from shared dir
cp /home/#{node['cluster']['cluster_user']}/.munge/.munge.key /etc/munge/munge.key
cp #{shared_dir}/.munge/.munge.key /etc/munge/munge.key
# Set ownership on the key
chown #{node['cluster']['munge']['user']}:#{node['cluster']['munge']['group']} /etc/munge/munge.key
# Enforce correct permission on the key
chmod 0600 /etc/munge/munge.key
COMPUTE_MUNGE_KEY
MUNGE_KEY
retries 5
retry_delay 10
end
end

def setup_munge_compute_node
setup_munge_key(node['cluster']['shared_dir'])
enable_munge_service
end
Comment on lines +138 to +141
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NitPick: this is trivial enough to be placed directly in the config_compute recipe. But I guess we need a more general refactoring later on (to move these helper functions inside the munge resource), so we can leave it this way for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Thanks Jacopo!


def setup_munge_login_node
setup_munge_key(node['cluster']['shared_dir_login'])
enable_munge_service
end
Comment on lines +143 to 146
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NitPick: this is trivial enough to be placed directly in the config_compute recipe. But I guess we need a more general refactoring later on (to move these helper functions inside the munge resource), so we can leave it this way for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Thanks Jacopo!


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,7 @@
region: node['cluster']['region'],
munge_user: node['cluster']['munge']['user'],
munge_group: node['cluster']['munge']['group'],
cluster_user: node['cluster']['cluster_user']
shared_directory: node['cluster']['shared_dir'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: please rename this variable as shared_directory_compute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

shared_directory_login: node['cluster']['shared_dir_login']
)
end
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,6 @@
# limitations under the License.

# TODO: rename, find a better name that include login nodes
setup_munge_compute_node
setup_munge_login_node

include_recipe 'aws-parallelcluster-slurm::mount_slurm_dir'
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
munge_user_dir = "/home/#{node['cluster']['cluster_user']}/.munge"
munge_user_dir = "#{node['cluster']['shared_dir']}/.munge"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure what this file is for (I guess for some kitchen tests that still need to be developed), but is it correct that this uses the compute node shared folder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure too. Previously, both Login Nodes and Compute Nodes fetch munge key from "/home/#{node['cluster']['cluster_user']}/.munge".
But maybe now we should also mock another
munge_user_dir_login = "#{node['cluster']['shared_dir_login']}/.munge"
and

file "#{munge_user_dir_login}/.munge.key" do
  content 'munge-key'
  owner node['cluster']['munge']['user']
  group node['cluster']['munge']['group']
end

directory munge_user_dir do
mode '1777'
jdeamicis marked this conversation as resolved.
Show resolved Hide resolved
end
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,8 @@ def update_nodes_in_queue(strategy, queues)
region: node['cluster']['region'],
munge_user: node['cluster']['munge']['user'],
munge_group: node['cluster']['munge']['group'],
cluster_user: node['cluster']['cluster_user']
shared_directory: node['cluster']['shared_dir'],
shared_directory_login: node['cluster']['shared_dir_login']
)
only_if { ::File.exist?(node['cluster']['previous_cluster_config_path']) && is_custom_munge_key_updated? }
end
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ SECRET_ARN="<%= @munge_key_secret_arn %>"
REGION="<%= @region %>"
MUNGE_USER="<%= @munge_user %>"
MUNGE_GROUP="<%= @munge_group %>"
CLUSTER_USER="<%= @cluster_user %>"
SHARED_DIRECTORY="<%= @shared_directory %>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here again, we must share it also with the login nodes, which use a different shared folder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have 5 minutes, please rename SHARED_DIRECTORY to SHARED_DIRECTORY_COMPUTE so that it's immediately distinguishable from SHARED_DIRECTORY_LOGIN.

SHARED_DIRECTORY_LOGIN="<%= @shared_directory_login %>"

# If SECRET_ARN is provided, fetch the munge key from Secrets Manager
if [ -n "${SECRET_ARN}" ]; then
Expand Down Expand Up @@ -69,8 +70,15 @@ fi

# Share munge key
echo "Sharing munge key"
mkdir -p /home/${CLUSTER_USER}/.munge
cp /etc/munge/munge.key /home/${CLUSTER_USER}/.munge/.munge.key
mkdir -p ${SHARED_DIRECTORY}/.munge
cp /etc/munge/munge.key ${SHARED_DIRECTORY}/.munge/.munge.key
chmod 0600 ${SHARED_DIRECTORY}/.munge
chmod 0600 ${SHARED_DIRECTORY}/.munge/.munge.key

mkdir -p ${SHARED_DIRECTORY_LOGIN}/.munge
cp /etc/munge/munge.key ${SHARED_DIRECTORY_LOGIN}/.munge/.munge.key
chmod 0600 ${SHARED_DIRECTORY_LOGIN}/.munge
chmod 0600 ${SHARED_DIRECTORY_LOGIN}/.munge/.munge.key
jdeamicis marked this conversation as resolved.
Show resolved Hide resolved
echo "Shared munge key"

exit 0