-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck allocation on dead job #806
Comments
@far-blue I could reproduce this, and thanks for reporting. |
I ran into this exact issue when trying out 0.3.0-rc2. As far as I can tell the only way to clear out the orphaned allocation is to clobber the nomad servers and remove all existing state :/ |
@diptanu, is there someone actively working on this? If not I would be willing to take a crack at it. |
@dgshep Yes! We might be able to tackle this in the next release. |
Very cool. BTW Congrats on the C1M project! Stellar stuff... |
I am seeing this on Nomad v0.5.4. I had a job that no longer exists with an allocation stuck on a node, trying to pull a container that no longer exists and receiving a 400 from the registry. Is it a regression, or have I triggered something completely new for some reason? |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
I'm new to all this so maybe I've just missed something but I appear to have an orphan allocation from a dead job that failed to completely start.
Context: Running v0.3.0rc1 in the dev environment created by the included vagrantfile. Running in --dev' mode (dual agent/client mode).
I started with a modified version of the example.nomad file created with nomad init and I modified the task to run a mysql container and added a second task to run an apache container. I started the job with
nomad run
but it failed to complete because I'd typo'd the apache container image name.At this point I had a mysql container running but no apache container. So I edited the job to correct my typo and called
nomad run
again. My understanding was that it would evaluate the difference and just start the apache container (because the mysql container was already running).However, it actually re-evaluated the entire job and started both the apache container and a second mysql container, while leaving the original container running. Note that I have not changed the name of the job or the task group (I left them as example and cache, as per the original job config).
So I called
nomad stop
thinking it would clean everything up but it only stopped the new containers, leaving the original mysql container. I thought maybe nomad had 'forgotten' about it so killed it with Docker directly - but nomad put it back.So now I have a mysql container that nomad is keeping alive but no job to control it with.
So I'm not quite sure what to do next and I'm pretty certain this is not expected behaviour.
Any thoughts anyone?
The text was updated successfully, but these errors were encountered: