-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: vllm bundle has invalid kubernetes spec and won't properly deploy on internal infrastructure #965
Comments
PR #937 is a WIP that will solve the manifest warning issues. The data injection issue is a known issue that only occurs in environments that don't use A recommended fix, separate from PR #937, to the actual issue you are encountering is to mount the volume to a directory that has had its permissions changed in the Docker manifest OR to provide root to the data injection's job container. These possible fixes are also WIP, I think? cc: @YrrepNoj @CollectiveUnicorn |
Thanks for the clarification! It sounds like there are no immediate hot fixes we could apply for 0.11.0? And if so, would a rollback to 0.9.2 to be suggested? It looks like the zarf data injection was introduced in 0.10.0. |
For now, please rollback to 0.9.2. The team needs to discuss at stand-up later today on the solution, as the best place to test the solution is on our staging environment that uses EFS (which I don't currently have access to). I've also been out for a little bit, so they may already have thought about or completed the fix! The ultimate fix may come in the form of another minor version, due to the nature of this breaking change and also some other changes we may wrap in. |
I'd also be happy to get you and anyone else setup in this environment |
related: zarf-dev/zarf#2263 I'm not sure there is a known work around for using zarf data injection as a nonroot user :/ |
Environment
Steps to reproduce
Expected result
vllm zarf package should deploy successfully
Actual Result
zarf deployment hangs and is unable to create the data injection target
Visual Proof (screenshots, videos, text, etc)
Additional Context
There is actually an invalid kubernetes manifest spec that may be causing this. Defining fsGroup under initContainers or containers is not valid, and therefore that may be the issue creating this directory. Instead, fsGroup should be defined at the spec.template.spec.securityContext level, not within the containers/initContainers array.
Improper definitions can be found here, here and here.
The text was updated successfully, but these errors were encountered: