You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[execute] sacctmgr: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:ip-172-25-7-120:6819: Connection refused
sacctmgr: error: Sending PersistInit msg: Connection refused
================================================================================
Error executing action `run` on resource 'execute[wait for slurm database]'
================================================================================
Mixlib::ShellOut::ShellCommandFailed
------------------------------------
Expected process to exit with [0], but received '1'
---- Begin output of /opt/slurm/bin/sacctmgr show clusters -Pn ----
STDOUT:
STDERR: sacctmgr: error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:ip-172-25-7-120:6819: Connection refused
sacctmgr: error: Sending PersistInit msg: Connection refused
---- End output of /opt/slurm/bin/sacctmgr show clusters -Pn ----
Ran /opt/slurm/bin/sacctmgr show clusters -Pn returned 1
Resource Declaration:
---------------------
# In /etc/chef/local-mode-cache/cache/cookbooks/aws-parallelcluster-slurm/recipes/config/config_slurm_accounting.rb
60: execute "wait for slurm database" do
61: command "#{node['cluster']['slurm']['install_dir']}/bin/sacctmgr show clusters -Pn"
62: retries node['cluster']['slurmdbd_response_retries']
63: retry_delay 10
64: end unless on_docker?
65:
66: bash "bootstrap slurm database" do
67: user 'root'
68: group 'root'
69: code <<-BOOTSTRAP
70: SACCTMGR_CMD=#{node['cluster']['slurm']['install_dir']}/bin/sacctmgr
71: CLUSTER_NAME=#{node['cluster']['stack_name']}
72: DEF_ACCOUNT=pcdefault
73: SLURM_USER=#{node['cluster']['slurm']['user']}
74: DEF_USER=#{node['cluster']['cluster_user']}
75:
76: # Add cluster to database if it is not present yet
77: [[ $($SACCTMGR_CMD show clusters -Pn cluster=$CLUSTER_NAME | grep $CLUSTER_NAME) ]] || \
78: $SACCTMGR_CMD -iQ add cluster $CLUSTER_NAME
79:
80: # Add account-cluster association to database if it is not present yet
81: [[ $($SACCTMGR_CMD list associations -Pn cluster=$CLUSTER_NAME account=$DEF_ACCOUNT format=account | grep $DEF_ACCOUNT) ]] || \
82: $SACCTMGR_CMD -iQ add account $DEF_ACCOUNT Cluster=$CLUSTER_NAME \
83: Description="ParallelCluster default account" Organization="none"
84:
85: # Add user-account associations to database if they are not present yet
86: [[ $($SACCTMGR_CMD list associations -Pn cluster=$CLUSTER_NAME account=$DEF_ACCOUNT user=$SLURM_USER format=user | grep $SLURM_USER) ]] || \
87: $SACCTMGR_CMD -iQ add user $SLURM_USER Account=$DEF_ACCOUNT AdminLevel=Admin
88: [[ $($SACCTMGR_CMD list associations -Pn cluster=$CLUSTER_NAME account=$DEF_ACCOUNT user=$DEF_USER format=user | grep $DEF_USER) ]] || \
89: $SACCTMGR_CMD -iQ add user $DEF_USER Account=$DEF_ACCOUNT AdminLevel=Admin
90:
91: # sacctmgr might throw errors if the DEF_ACCOUNT is not associated to a cluster already defined on the database.
92: # This is not important for the scope of this script, so we return 0.
93: exit 0
94: BOOTSTRAP
95: end unless on_docker?
Compiled Resource:
------------------
# Declared in /etc/chef/local-mode-cache/cache/cookbooks/aws-parallelcluster-slurm/recipes/config/config_slurm_accounting.rb:60:in `from_file'
execute("wait for slurm database") do
action [:run]
default_guard_interpreter :execute
command "/opt/slurm/bin/sacctmgr show clusters -Pn"
declared_type :execute
cookbook_name "aws-parallelcluster-slurm"
recipe_name "config_slurm_accounting"
retries 30
retry_delay 10
end
System Info:
------------
chef_version=18.2.7
platform=amazon
platform_version=2
ruby=ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
program_name=/bin/cinc-client
executable=/opt/cinc/bin/cinc-client
The text was updated successfully, but these errors were encountered:
Hi, I was following the guide on https://aws.amazon.com/blogs/hpc/leveraging-slurm-accounting-in-aws-parallelcluster/ and encountered the following error:
The text was updated successfully, but these errors were encountered: