Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wiki instructions for Option 2 fix of (3.8.0 ‐ 3.9.3) ParallelCluster Build Image Failing during Installation of Minitar Ruby Gem Dependency aren't quite right. #6530

Closed
gwolski opened this issue Nov 3, 2024 · 3 comments
Labels

Comments

@gwolski
Copy link

gwolski commented Nov 3, 2024

I tried following the option 2 instructions on the wiki page https://github.com/aws/aws-parallelcluster/wiki/(3.8.0-%E2%80%90-3.9.3)-ParallelCluster-Build-Image-Failing-during-Installation-of-Minitar-Ruby-Gem-Dependency but it failed. Here is my result and what I did that seems to work but needs validating by someone more familiar with this code section and then update wiki:

The code snippet for Option 2 shows:

## Replace below Line in InstallCinc Step
/opt/cinc/embedded/bin/gem install --local --no-document berkshelf:{{ BerkshelfVersion }}

## Replace with below line, effectively Pin the Minitar version to 0.12 
/opt/cinc/embedded/bin/gem install --no-document minitar:0.12
/opt/cinc/embedded/bin/gem install --local --no-document berkshelf:{{ BerkshelfVersion }}

However when I downloaded the source for parallelcluster 3.9.3, the original line that I'm replacing is missing the --local:

/opt/cinc/embedded/bin/gem install --no-document berkshelf:{{ BerkshelfVersion }}

Nonetheless, I replaced that one line with the exact code provided.

I then activated my pcluster 3.9.3 virtual env and ran the shown
pip install ./cli

My build-image failed.

I built a new virtual environment using the same pip install command after activating the new environment, still build failure.

The error message is of the form as shown on the wiki:

"Workflow Execution ID: 'wf-0988f3c0-6389-4138-bf1a-ceffc6b10876' failed with reason: Document arn:aws:imagebuilder:us-west-2:326469498578:component/parallelclusterimage-694714c0-99ae-11ef-8f75-060511802875/3.9.3/1 failed!"

Which suggested this error is still there - but I know I've got the right minitar now...

I know nothing about "gem install", but after talking with ChatGPT a bit, it seems adding the --local to the berkshelf install does not make sense. It wasn't there originally, we haven't downloaded berkshelf locally that I can figure out, so I tried removing the --local for the berkshelf install. ChatGPT tells me that berkshelf will use the minitar:0.12 we installed locally unless berkshelf specifies something specific that precludes it and since berkshelf isn't local, it will grab it from the proper remote repo.

So I changed the code to be:


              /opt/cinc/embedded/bin/gem install --no-document minitar:0.12
              /opt/cinc/embedded/bin/gem install --no-document berkshelf:{{ BerkshelfVersion }}

did the

pip install ./cli

and now my pcluster build-image actually created an ami and I can boot it standalone. Now to deploy it.

Was this posted code actually tested? Did I do this right?
As to moving forward with 3.10.x or 3.11.x - I'm having problems with both of those as well.
3.11.1 issue has been filed as #6529
and
3.10.1 issue might be an issue with aws-eda-slurm-cluster CloudFormation code, but I link it here for your review in case you have seen this as well: aws-samples/aws-eda-slurm-cluster#280

@gwolski gwolski added the 3.x label Nov 3, 2024
@himani2411
Copy link
Contributor

himani2411 commented Nov 5, 2024

Hi @gwolski

Thank you for catching this documentation mistake.
I have updated the https://github.com/aws/aws-parallelcluster/wiki/(3.8.0-%E2%80%90-3.9.3)-ParallelCluster-Build-Image-Failing-during-Installation-of-Minitar-Ruby-Gem-Dependency.

We did internally test the changes for these versions and they were successful. This was a documentation mistake made w.r.t code from 3.10.0 which is the version in which we added the --local flag for gem installation where we install some gems just before we install Minitar and Berkself.

@himani2411
Copy link
Contributor

Will take a look at the other issues.

@gwolski
Copy link
Author

gwolski commented Nov 6, 2024

Thank you for your confirmation and root causing how the error came to be. I have now been able to deploy it.

@gwolski gwolski closed this as completed Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants