You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
wiki instructions for Option 2 fix of (3.8.0 ‐ 3.9.3) ParallelCluster Build Image Failing during Installation of Minitar Ruby Gem Dependency aren't quite right.
#6530
Closed
gwolski opened this issue
Nov 3, 2024
· 3 comments
Nonetheless, I replaced that one line with the exact code provided.
I then activated my pcluster 3.9.3 virtual env and ran the shown
pip install ./cli
My build-image failed.
I built a new virtual environment using the same pip install command after activating the new environment, still build failure.
The error message is of the form as shown on the wiki:
"Workflow Execution ID: 'wf-0988f3c0-6389-4138-bf1a-ceffc6b10876' failed with reason: Document arn:aws:imagebuilder:us-west-2:326469498578:component/parallelclusterimage-694714c0-99ae-11ef-8f75-060511802875/3.9.3/1 failed!"
Which suggested this error is still there - but I know I've got the right minitar now...
I know nothing about "gem install", but after talking with ChatGPT a bit, it seems adding the --local to the berkshelf install does not make sense. It wasn't there originally, we haven't downloaded berkshelf locally that I can figure out, so I tried removing the --local for the berkshelf install. ChatGPT tells me that berkshelf will use the minitar:0.12 we installed locally unless berkshelf specifies something specific that precludes it and since berkshelf isn't local, it will grab it from the proper remote repo.
and now my pcluster build-image actually created an ami and I can boot it standalone. Now to deploy it.
Was this posted code actually tested? Did I do this right?
As to moving forward with 3.10.x or 3.11.x - I'm having problems with both of those as well.
3.11.1 issue has been filed as #6529
and
3.10.1 issue might be an issue with aws-eda-slurm-cluster CloudFormation code, but I link it here for your review in case you have seen this as well: aws-samples/aws-eda-slurm-cluster#280
The text was updated successfully, but these errors were encountered:
We did internally test the changes for these versions and they were successful. This was a documentation mistake made w.r.t code from 3.10.0 which is the version in which we added the --local flag for gem installation where we install some gems just before we install Minitar and Berkself.
I tried following the option 2 instructions on the wiki page https://github.com/aws/aws-parallelcluster/wiki/(3.8.0-%E2%80%90-3.9.3)-ParallelCluster-Build-Image-Failing-during-Installation-of-Minitar-Ruby-Gem-Dependency but it failed. Here is my result and what I did that seems to work but needs validating by someone more familiar with this code section and then update wiki:
The code snippet for Option 2 shows:
However when I downloaded the source for parallelcluster 3.9.3, the original line that I'm replacing is missing the --local:
Nonetheless, I replaced that one line with the exact code provided.
I then activated my pcluster 3.9.3 virtual env and ran the shown
pip install ./cli
My build-image failed.
I built a new virtual environment using the same pip install command after activating the new environment, still build failure.
The error message is of the form as shown on the wiki:
"Workflow Execution ID: 'wf-0988f3c0-6389-4138-bf1a-ceffc6b10876' failed with reason: Document arn:aws:imagebuilder:us-west-2:326469498578:component/parallelclusterimage-694714c0-99ae-11ef-8f75-060511802875/3.9.3/1 failed!"
Which suggested this error is still there - but I know I've got the right minitar now...
I know nothing about "gem install", but after talking with ChatGPT a bit, it seems adding the --local to the berkshelf install does not make sense. It wasn't there originally, we haven't downloaded berkshelf locally that I can figure out, so I tried removing the --local for the berkshelf install. ChatGPT tells me that berkshelf will use the minitar:0.12 we installed locally unless berkshelf specifies something specific that precludes it and since berkshelf isn't local, it will grab it from the proper remote repo.
So I changed the code to be:
did the
and now my pcluster build-image actually created an ami and I can boot it standalone. Now to deploy it.
Was this posted code actually tested? Did I do this right?
As to moving forward with 3.10.x or 3.11.x - I'm having problems with both of those as well.
3.11.1 issue has been filed as #6529
and
3.10.1 issue might be an issue with aws-eda-slurm-cluster CloudFormation code, but I link it here for your review in case you have seen this as well: aws-samples/aws-eda-slurm-cluster#280
The text was updated successfully, but these errors were encountered: