-
Notifications
You must be signed in to change notification settings - Fork 312
(2.11.2 and earlier) Custom AMI creation (pcluster createami) fails when building SGE
Custom AMI creation with command pcluster createami
or cluster creation using custom AMI with no pre-baked ParallelCluster software stack are failing with the following error:
======================================================================..
error executing action `run` on resource 'execute[sge_preinstall]'
======================================================================..
mixlib::shellout::shellcommandfailed
------------------------------------
expected process to exit with [0], but received '28'
---- begin output of sh /tmp/sge_preinstall.sh ----
stdout: downloading and extracting source packages for sge-8.1.9
stderr: % total % received % xferd average speed time time ..
dload upload total spent left..
0 0 0 0 0 0 0 0 --:--:-- 0:02:10 --:--:-- ..
curl: (28) failed to connect to arc.liv.ac.uk port 443: connection tim..
warning: problem : timeout. will retry in 5 seconds. 3 retries left.
0 0 0 0 0 0 0 0 --:--:-- 0:02:10 --:--:-- ..
curl: (28) failed to connect to arc.liv.ac.uk port 443: connection tim..
warning: problem : timeout. will retry in 5 seconds. 2 retries left.
0 0 0 0 0 0 0 0 --:--:-- 0:02:10 --:--:-- ..
curl: (28) failed to connect to arc.liv.ac.uk port 443: connection tim..
warning: problem : timeout. will retry in 5 seconds. 1 retries left.
0 0 0 0 0 0 0 0 --:--:-- 0:02:10 --:--:-- ..
curl: (28) failed to connect to arc.liv.ac.uk port 443: connection tim..
---- end output of sh /tmp/sge_preinstall.sh ----
ran sh /tmp/sge_preinstall.sh returned 28
Root cause is because arc.liv.ac.uk
is down and it’s not possible to download SGE sources from https://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/sge-8.1.9.tar.gz
2.11.2 and older are affected, OS affected according to the following table:
version\os | AL1 | AL2 | CentOS6 | CentOS7 | CentOS8 | Ubuntu16 | Ubuntu18 | Ubuntu20 |
---|---|---|---|---|---|---|---|---|
2.11.2 | not supported | AFFECTED | not supported | AFFECTED | AFFECTED | not supported | OK | OK |
2.10.4 | AFFECTED | OK | not supported | AFFECTED | AFFECTED | AFFECTED | OK | not supported |
2.9.1 | AFFECTED | OK | AFFECTED | AFFECTED | not supported | AFFECTED | OK | not supported |
Bug affects the following functionalities
- create custom AMI using command
pcluster createami
- SGE cluster creation with runtime bake (HeadNode cannot be created)
- existing SGE clusters with version older than 2.3.0 (included) using runtime bake
Existing clusters since 2.4.0 (included) aren't affected because SGE isn't installed on compute nodes, but it's shared from HeadNode
For createami
-
For custom AMI creation, instead of using the
pcluster createami
command, follow the guide "Building a Custom AWS ParallelCluster AMI -> Modify an AWS ParallelCluster AMI" from this doc -
As alternative, If you aren't using SGE and still want to use the
pcluster createami
command, do the following- Create a script named
patch_skip_sge_installation.sh
with the following contents. Replace ParallelCluster version with the version you want to patch:
#!/bin/bash VERSION="<parallelcluster-version>" git clone https://github.com/aws/aws-parallelcluster-cookbook.git cd ./aws-parallelcluster-cookbook git fetch origin git checkout v$VERSION sed -i "s/^include_recipe 'aws-parallelcluster::sge_install'/#&/g" recipes/default.rb # if you are using "BSD sed" replace the above sed line with the following one # sed -i "" "s/^include_recipe 'aws-parallelcluster::sge_install'/#&/g" recipes/default.rb cd .. tar zcvf "$(pwd)/aws-parallelcluster-cookbook-${VERSION}.tgz" ./aws-parallelcluster-cookbook
- Execute the script
bash patch_skip_sge_installation.sh
- A tar file with name
aws-parallelcluster-cookbook-$VERSION.tgz
will be generated. Upload the generated cookbook to your S3 bucket. - Use the custom cookbook as parameter for the
createami
command, e.g.
pcluster createami -ai $BASE_AMI_ID -os $BASE_AMI_OS -cc <custom-cookbook-s3-url>
- Create a script named
-
If, instead, you are required to use SGE with one of the affected OS, do the following:
- Spin up an instance from an official ParalleCLuster AMI, OS doesn't matter, version not greater than 2.11.2, e.g.
ami-05c34afa8785834ae
inus-east-1
- From the instance, retrieve the file
/opt/parallelcluster/sources/sge-8.1.9.tar.gz
and upload it into a S3 bucket - Create a script named
patch_sge_url.sh
with the following contents. Replace ParallelCluster version with the version you want to patch and SGE source with the S3 URL where the SGE sources were uploaded:
#!/bin/bash VERSION="<parallelcluster-version>" git clone https://github.com/aws/aws-parallelcluster-cookbook.git cd ./aws-parallelcluster-cookbook git fetch origin git checkout v$VERSION sed -i "s/default\['cfncluster'\]\['sge'\]\['url'\]/#&/g" attributes/default.rb # if you are using "BSD sed" replace the above sed line with the following one # sed -i "" "s/default\['cfncluster'\]\['sge'\]\['url'\]/#&/g" attributes/default.rb echo "default['cfncluster']['sge']['url'] = '<sge-sources-s3-url>'" >> attributes/default.rb cd .. tar zcvf "$(pwd)/aws-parallelcluster-cookbook-${VERSION}.tgz" ./aws-parallelcluster-cookbook
- Execute the script
bash patch_sge_url.sh
- A tar file with name
aws-parallelcluster-cookbook-$VERSION.tgz
will be generated. Upload the generated cookbook to your S3 bucket. - Use the custom cookbook as parameter for the
createami
command, e.g.
pcluster createami -ai $BASE_AMI_ID -os $BASE_AMI_OS -cc <custom-cookbook-s3-url>
- Spin up an instance from an official ParalleCLuster AMI, OS doesn't matter, version not greater than 2.11.2, e.g.
For cluster creation For SGE cluster creation, avoid using an AMI not backed with ParallelCluster software stack and build an AMI in advance following one of the above workaround.