pending allocations stuck in pending state after adoption by a new deployment #23305

lattwood · 2024-06-11T20:01:31Z

Nomad version

Nomad v1.7.7
BuildDate 2024-04-16T19:26:43Z
Revision 0f34c85ee63f6472bd2db1e2487611f4b176c70c

Operating system and Environment details

All Linux

Issue

We have a large Nomad cluster- > 2k agents.
We run health checks on each agent and do dynamic metadata updates.
We have a large job that gets scheduled on 99% of the healthy agents.
We use a shell script to query nomad for node counts matching a filter on relevant dynamic metadata & run nomad job scale JOB TG COUNT when it changes.
We have a way to identify if the currently running deployment is due to a new Docker Image, or due to a scaling activity.
If it's due to a scaling activity on another task group, we cancel that deployment, and issue a new nomad job scale command immediately afterwards.
- Due to how frequently the dynamic metadata is changing, we are constantly doing deployments.
- We kicked off 250 deployments in the last hour.
Existing allocations usually all get adopted by the new deployment.
The issues:
- most of the time, but not always, the version on the existing allocs that are still "Pending" at deployment cancelation are updated. but we end up with allocations that don't get the version updated.
- sometimes the allocation exists, but isn't assigned to a client in the UI. if you inspect the alloc with the Nomad CLI, it tells you which client it has been assigned to.
- allocations get stuck in the pending state. we've found sending a SIGQUIT to the Nomad Agent fixes this, but that doesn't scale to > 2k nodes.
- allocations will get rescheduled before they ever leave the pending state. YEAH.
- possibly related, but might not be- if a node heartbeat is missed and reregistered, we can end up with stuck pending jobs on a node as well. This is also fixed with a SIGQUIT.
  - The stuck jobs aspect is brutal, since that doesn't start the health/deployment timers as far as I can tell.
  - The fact restarting the nomad agent is a little unnerving. I vaguely recall seeing something in the logs about "allocations out of date" or something like that when this issue has manifested itself in the past.

Reproduction steps

run a big cluster of nodes with intermittent connectivity issues to the master.
run a big long running job on said cluster
constantly be deploying/scaling the jobspec
wait

Expected Result

nomad schedules and runs jobs

Actual Result

nomad doesn't

Screenshots of Absurdity

In this screenshot, all the allocs are part of the same job, but different task groups. The colours correspond to the actual task groups. Note the varying versions, and some having client and some not.

Additionally, you can see the Modified time is the same for ones staying up to date, but isn't changing on other ones- you can also see the Created times are all over the place.

In this screenshot, you can see we have a pending allocation that has been rescheduled, and that rescheduled allocation is marked pending as well. And neither of the allocations have been assigned to a client as far as the Nomad WebUI is concerned.

Reporter's speculation

Maybe it has something to do w/ how the allocations are being adopted by the rapid deployments? This definitely reeks of race condition.

Nomad logs

Useless, unfortunately, due to this bug: #22431

The text was updated successfully, but these errors were encountered:

lattwood · 2024-06-11T20:47:02Z

in trying to kick the broken allocs, I've managed to get Nomad to try sending an interrupt to a pending allocation!

edit: had to killall -9 nomad on ca11.

lattwood · 2024-06-11T21:12:34Z

I figured out how to make it worse! If I drain the node and mark it as ineligible, then re-enable eligibility, all of the system jobs end up with additional allocations. The only solution here is restarting/killing the Nomad agent.

lattwood · 2024-06-11T21:19:12Z

Just adding for additional flavour/I found this hilarious-

lattwood · 2024-06-12T13:59:43Z

Pending alloc example with logs from the Nomad Agent. Times are all accurate to each other.

root@HOSTNAME:~# TZ=America/Halifax journalctl --unit nomad --since '14 hours ago' | grep -v '\(runner\)'
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"debug","@message":"received EOF, stopping recv loop","@module":"client.driver_mgr.docker.docker_logger.stdio","@timestamp":"2024-06-12T02:28:35.213106Z","driver":"docker","err":"rpc error: code = Unavailable desc = error reading from server: EOF"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"info","@message":"plugin process exited","@module":"client.driver_mgr.docker.docker_logger","@timestamp":"2024-06-12T02:28:35.224019Z","driver":"docker","id":"2362096","plugin":"/usr/local/bin/nomad"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"debug","@message":"plugin exited","@module":"client.driver_mgr.docker.docker_logger","@timestamp":"2024-06-12T02:28:35.225535Z","driver":"docker"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"debug","@message":"image id reference count decremented","@module":"client.driver_mgr.docker","@timestamp":"2024-06-12T02:28:35.430987Z","driver":"docker","image_id":"sha256:31d1924d3c7e119a450ce9d0d2fbd417a1aba168781625acbc7bf77b398cee33","references":0}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"debug","@message":"(watcher) stopping all views","@module":"agent","@timestamp":"2024-06-12T02:28:35.472850Z"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"info","@message":"stopped container","@module":"client.driver_mgr.docker","@timestamp":"2024-06-12T02:28:35.610787Z","container_id":"02ae65871d36198b058a7b98387c5b40bc857e7dfc8eb6bd6de60407e6f28305","driver":"docker"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"debug","@message":"received EOF, stopping recv loop","@module":"client.driver_mgr.docker.docker_logger.stdio","@timestamp":"2024-06-12T02:28:35.626094Z","driver":"docker","err":"rpc error: code = Unavailable desc = error reading from server: EOF"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"error","@message":"error encountered while scanning stdout","@module":"client.driver_mgr.docker.docker_logger","@timestamp":"2024-06-12T02:28:35.631515Z","driver":"docker","error":"read |0: file already closed"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"info","@message":"plugin process exited","@module":"client.driver_mgr.docker.docker_logger","@timestamp":"2024-06-12T02:28:35.631600Z","driver":"docker","id":"2361837","plugin":"/usr/local/bin/nomad"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"debug","@message":"plugin exited","@module":"client.driver_mgr.docker.docker_logger","@timestamp":"2024-06-12T02:28:35.632337Z","driver":"docker"}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"debug","@message":"image id reference count decremented","@module":"client.driver_mgr.docker","@timestamp":"2024-06-12T02:28:35.654011Z","driver":"docker","image_id":"sha256:6331aa0368327f0829316ca10b45844eb7303caf5f63cc979063df0ef7ffca01","references":0}
Jun 11 23:28:35 HOSTNAME nomad[3215]: {"@level":"info","@message":"marking allocation for GC","@module":"client.gc","@timestamp":"2024-06-12T02:28:35.669645Z","alloc_id":"221a2ab1-43ea-22bb-1616-24d9ce047609"}
Jun 11 23:31:40 HOSTNAME nomad[3215]: {"@level":"debug","@message":"cleanup removed downloaded image","@module":"client.driver_mgr.docker","@timestamp":"2024-06-12T02:31:40.406819Z","driver":"docker","image_id":"sha256:31d1924d3c7e119a450ce9d0d2fbd417a1aba168781625acbc7bf77b398cee33"}
Jun 11 23:31:41 HOSTNAME nomad[3215]: {"@level":"debug","@message":"cleanup removed downloaded image","@module":"client.driver_mgr.docker","@timestamp":"2024-06-12T02:31:41.201500Z","driver":"docker","image_id":"sha256:6331aa0368327f0829316ca10b45844eb7303caf5f63cc979063df0ef7ffca01"}

From what I can tell, all the logs from ~23:38:35 are due to the failed allocation, and there is nothing in the agent's logs about the pending allocation.

Generally speaking, forcibly killing Nomad (we have draining enabled) tends to fix these issues, but these state inconsistency bugs are terrifying.

jrasell · 2024-06-12T14:13:11Z

Hi @lattwood and thanks for raising this issue with the detail included. I'll add this to our backlog to investigate further. In the meantime, if you discover any other information or concrete reproduction, please let us know.

lattwood · 2024-06-12T15:32:45Z

More log spelunking- after this shows up in the logs on an agent/client, no further alloc updates occur, and the drain issue with the pending allocs also starts occuring too.

Jun 11 14:50:29 HOSTNAME nomad[2835]: {"@level":"debug","@message":"allocation updates applied","@module":"client","@timestamp":"2024-06-11T14:50:29.411617Z","added":0,"errors":0,"ignored":4,"removed":0,"updated":0}
Jun 11 14:50:50 HOSTNAME nomad[2835]: {"@level":"error","@message":"yamux: keepalive failed: i/o deadline reached","@module":"client","@timestamp":"2024-06-11T14:50:50.303589Z"}
Jun 11 14:50:50 HOSTNAME nomad[2835]: {"@level":"error","@message":"yamux: Failed to read stream data: read tcp [IPv6_ADDRESS_1]:42188-\u003e[IPv6_ADDRESS_2]:4647: use of closed network connection","@module":"client","@timestamp":"2024-06-11T14:50:50.303834Z"}
Jun 11 14:50:50 HOSTNAME nomad[2835]: {"@level":"error","@message":"error performing RPC to server","@module":"client.rpc","@timestamp":"2024-06-11T14:50:50.304920Z","error":"rpc error: EOF","rpc":"Node.UpdateAlloc","server":{"IP":"IPv6_ADDRESS_2","Port":4647,"Zone":""}}
Jun 11 14:50:50 HOSTNAME nomad[2835]: {"@level":"error","@message":"error performing RPC to server which is not safe to automatically retry","@module":"client.rpc","@timestamp":"2024-06-11T14:50:50.305307Z","error":"rpc error: EOF","rpc":"Node.UpdateAlloc","server":{"IP":"IPv6_ADDRESS_2","Port":4647,"Zone":""}}
Jun 11 14:50:50 HOSTNAME nomad[2835]: {"@level":"error","@message":"error updating allocations","@module":"client","@timestamp":"2024-06-11T14:50:50.305578Z","error":"rpc error: EOF"}
Jun 11 14:50:50 HOSTNAME nomad[2835]: {"@level":"error","@message":"error performing RPC to server","@module":"client.rpc","@timestamp":"2024-06-11T14:50:50.305401Z","error":"rpc error: EOF","rpc":"Alloc.GetAllocs","server":{"IP":"IPv6_ADDRESS_2","Port":4647,"Zone":""}}

lattwood · 2024-06-12T16:07:17Z

grabbed a goroutine stack dump and found a clue.

the same node is blocked here, and was for 1489 minutes, which ends up being just after the "error performing RPC to server" stuff in the above comment: https://github.com/hashicorp/yamux/blob/6c5a7317d6e3b6e7f85db696d6c83ed353e7cb4c/stream.go#L145-L153

Maybe timeouts aren't getting set on retries? idk.

edit: the deserialization methods for a jobspec are also evident in the stack trace.

Jun 12 15:40:35 HOSTNAME nomad[2835]: goroutine 254 gp=0xc0006888c0 m=nil [select, 1489 minutes]:
Jun 12 15:40:35 HOSTNAME nomad[2835]: runtime.gopark(0xc0024a8d00?, 0x2?, 0x25?, 0xa9?, 0xc0024a8c94?)
Jun 12 15:40:35 HOSTNAME nomad[2835]:         runtime/proc.go:402 +0xce fp=0xc0024a8b38 sp=0xc0024a8b18 pc=0x44630e
Jun 12 15:40:35 HOSTNAME nomad[2835]: runtime.selectgo(0xc0024a8d00, 0xc0024a8c90, 0xc003ed46f0?, 0x0, 0xc0024a8cd8?, 0x1)
Jun 12 15:40:35 HOSTNAME nomad[2835]:         runtime/select.go:327 +0x725 fp=0xc0024a8c58 sp=0xc0024a8b38 pc=0x458385
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/yamux.(*Stream).Read(0xc0032fa000, {0xc002cc1000, 0x1000, 0xc001cc0b00?})
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/[email protected]/stream.go:145 +0x40e fp=0xc0024a8d50 sp=0xc0024a8c58 pc=0xdb2ace
Jun 12 15:40:35 HOSTNAME nomad[2835]: bufio.(*Reader).Read(0xc002081e00, {0xc001cc0b00, 0xb, 0xdf9db0?})
Jun 12 15:40:35 HOSTNAME nomad[2835]:         bufio/bufio.go:241 +0x197 fp=0xc0024a8d88 sp=0xc0024a8d50 pc=0x6ca857
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/go-msgpack/codec.decReadFull({0x3970800, 0xc002081e00}, {0xc001cc0b00, 0xb, 0x40})
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/[email protected]/codec/decode.go:3086 +0xa3 fp=0xc0024a8dd8 sp=0xc0024a8d88 pc=0xdfd303
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/go-msgpack/codec.(*ioDecReader).readb(0xc0035bde00, {0xc001cc0b00, 0xb, 0x7ed07119d5b8?})
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/[email protected]/codec/decode.go:422 +0x4d fp=0xc0024a8e28 sp=0xc0024a8dd8 pc=0xdf1aed
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/go-msgpack/codec.(*decReaderSwitch).readbIO(0xc002081e00?, {0xc001cc0b00?, 0x0?, 0x17?})
...
elided for brevity
...
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/nomad/client.(*Client).rpc(0xc00091f888, {0x3112c54, 0xf}, {0x2f9c6c0, 0xc000485d90}, {0x2c6b860, 0xc0008f1860})
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/nomad/client/rpc.go:104 +0x23e fp=0xc0024a9988 sp=0xc0024a9818 pc=0x1dcdb9e
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/nomad/client.(*Client).RPC(...)
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/nomad/client/rpc.go:61
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/nomad/client.(*Client).rpc(0xc00091f888, {0x3112c54, 0xf}, {0x2f9c6c0, 0xc000485d90}, {0x2c6b860, 0xc0008f1860})
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/nomad/client/rpc.go:154 +0xb2d fp=0xc0024a9af8 sp=0xc0024a9988 pc=0x1dce48d
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/nomad/client.(*Client).RPC(...)
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/nomad/client/rpc.go:61
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/nomad/client.(*Client).watchAllocations(0xc00091f888, 0xc0008f1740)
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/nomad/client/client.go:2395 +0x96e fp=0xc0024a9fc0 sp=0xc0024a9af8 pc=0x1dacf6e
Jun 12 15:40:35 HOSTNAME nomad[2835]: github.com/hashicorp/nomad/client.(*Client).run.gowrap1()
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/nomad/client/client.go:1850 +0x25 fp=0xc0024a9fe0 sp=0xc0024a9fc0 pc=0x1da9d65
Jun 12 15:40:35 HOSTNAME nomad[2835]: runtime.goexit({})
Jun 12 15:40:35 HOSTNAME nomad[2835]:         runtime/asm_amd64.s:1695 +0x1 fp=0xc0024a9fe8 sp=0xc0024a9fe0 pc=0x47ebe1
Jun 12 15:40:35 HOSTNAME nomad[2835]: created by github.com/hashicorp/nomad/client.(*Client).run in goroutine 247
Jun 12 15:40:35 HOSTNAME nomad[2835]:         github.com/hashicorp/nomad/client/client.go:1850 +0x86

lattwood · 2024-06-13T17:53:02Z

@jrasell I found the bug, it was in yamux. PR: hashicorp/yamux#127

lattwood · 2024-06-13T18:33:44Z

I'm suspecting this bug is caused by the whole pending allocations

jrasell · 2024-06-14T06:11:43Z

Hi @lattwood, amazing work and thanks so much for digging into this and raising a PR. I'll poke around internally and see who I need to get a reviewers on the PR.

lattwood · 2024-06-14T12:32:58Z

@jrasell good stuff. i'm likely going to be cutting a 1.7.7-dev or 1.7.8 build with the yamux change and rolling it out on our side today, once i change the return to a goto.

lattwood · 2024-06-14T14:43:37Z

Just added some more bug/correctness fixes to the PR.

lattwood · 2024-06-17T17:54:13Z

https://github.com/lattwood/nomad/releases/tag/v1.7.8-yamux-fork

I threw a build up there with the yamux fix.

lattwood · 2024-06-20T15:39:37Z

Confirmed that hashicorp/yamux#127 fixes the issue.

We'll be running a forked Nomad build until there's a release of Nomad with the yamux fix in it.

edit: idk if I'm out of line for saying this, but the yamux fix probably should be backported everywhere it's in use 😅

schmichael · 2024-08-14T20:33:10Z

Did you have TLS configured for Nomad's RPC?

lattwood · 2024-08-14T20:41:28Z

@schmichael yup. (our nomad agents are distributed around the world)

Specifically to debug hashicorp/nomad#23305 but tests should probably run against multiple net.Conn implementations as yamux is sensitive to net.Conn behaviors such as concurrent Read|Write|Close and what errors are returned.

lattwood · 2024-08-21T14:37:25Z

Just saw another issue opened on yamux this week that could be responsible for what we're seeing here as well- hashicorp/yamux#133

Specifically to debug hashicorp/nomad#23305 but tests should probably run against multiple net.Conn implementations as yamux is sensitive to net.Conn behaviors such as concurrent Read|Write|Close and what errors are returned.

lattwood · 2024-09-05T18:43:23Z

hmm, i don't think so? heh.

…

On Thu, Sep 5, 2024 at 3:05 PM Michael Schurter ***@***.***> wrote: Closed #23305 <#23305> as completed via hashicorp/yamux#127 <hashicorp/yamux#127>. — Reply to this email directly, view it on GitHub <#23305 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABPJNKA4TPGTSASFDO6LNHDZVCMPDAVCNFSM6AAAAABJFAB6ASVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGE2TCMRTGI2DKMY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

schmichael · 2024-09-05T18:46:24Z

hmm, i don't think so? heh.

🤦 Sorry. Evidently Github treats "Should fix #..." as meaning "Absolutely does fix #..." Reopening.

* test against tls.Conns, not pipes Specifically to debug hashicorp/nomad#23305 but tests should probably run against multiple net.Conn implementations as yamux is sensitive to net.Conn behaviors such as concurrent Read|Write|Close and what errors are returned. * locally closed streams can still read Effectively reverts 912e296 Reasons to revert to locally closed streams being readable: 1. Matches libp2p's yamux fork: https://github.com/libp2p/go-yamux/blob/master/stream.go#L95-L96 2. Both yamux and SPDY make it clear that a locally closed stream cannot send more data. 3. SPDY explicitly supports unidirectional streams where one peer is closed (readonly) from the beginning: https://www.chromium.org/spdy/spdy-protocol/spdy-protocol-draft3-1/#46-unidirectional-streams * test: fix timing when using tcp TestSession_PartialReadWindowUpdate asserts that large sends can cause window updates. When using io.Pipes the server recieves the window update nearly synchronously during the client's Read call. Using tcp sockets adds just enough latency to reliably lose the race for observing the window update on the server. Added a sleep to ensure the server has time to read and apply the window update from the client. I also added a number of comments and non-functional cleanups. The time.Sleep() is the only functional change. * test: expand connection types tested Expanded tests to use TCP and TLS. Sorry for the huge diff, but I think this makes yamux's test suite much more realistic. There are quite a few new tools: - testConnPipe is the original io.Pipe based testing tool. Some tests still require it due to the ability to easily pause the data flow. - testConnTCP and testConnTLS create TCP and TLS connections for yamux to use. This introduces more realistic timing issues, buffering, and subtle differences in error messages. - testConnTypes is a helper to run subtests against *all* the above connection types as well as ensuring reversing the client/server sockets doesn't impact yamux (it didn't!). I didn't convert every test to it since it adds some time and noise to test runs. I also tried to formalize (client, server) as a pattern. There was a mix of orderings. Those roles are rarely meaningful to yamux, but meaningful names makes debugging easier than numbering variables.

lattwood · 2024-09-09T14:47:07Z

In the context of this- #23848

would it be worthwhile for us to try dropping in the libp2p yamux fork

The main point of this dependency upgrade is to pull in the fixes in hashicorp/yamux#127 which prevents leaking deadlocked goroutines. It has been observed to improve the issue in #23305 but does not seem sufficient to fix it entirely. Since touching yamux is a rare and scary event, I do **not** intend to backport this. If we discover the improvements are stable and significant enough, or if further fixes land in yamux, backporting can be done at that time.

lattwood added the type/bug label Jun 11, 2024

jrasell added theme/client stage/needs-investigation labels Jun 12, 2024

lattwood mentioned this issue Jun 13, 2024

Streams should check for Session shutdown when waiting for data & clean up timers hashicorp/yamux#127

Merged

david-yu added the hcc/jira label Jun 13, 2024

tgross added this to Nomad - Community Issues Triage Jun 24, 2024

tgross moved this to Needs Roadmapping in Nomad - Community Issues Triage Jun 24, 2024

schmichael mentioned this issue Aug 14, 2024

Allow half-closed reads and test against TCP/TLS connections hashicorp/yamux#131

Merged

schmichael closed this as completed in hashicorp/yamux#127 Sep 5, 2024

github-project-automation bot moved this from Needs Roadmapping to Done in Nomad - Community Issues Triage Sep 5, 2024

schmichael reopened this Sep 5, 2024

github-project-automation bot moved this from Done to Needs Triage in Nomad - Community Issues Triage Sep 5, 2024

Juanadelacuesta moved this from Needs Triage to In Progress in Nomad - Community Issues Triage Sep 6, 2024

Juanadelacuesta assigned Juanadelacuesta and unassigned Juanadelacuesta Sep 6, 2024

schmichael mentioned this issue Sep 24, 2024

deps: upgrade yamux from v0.1.1 to v0.1.2 #24058

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pending allocations stuck in pending state after adoption by a new deployment #23305

pending allocations stuck in pending state after adoption by a new deployment #23305

lattwood commented Jun 11, 2024 •

edited

Loading

lattwood commented Jun 11, 2024 •

edited

Loading

lattwood commented Jun 11, 2024

lattwood commented Jun 11, 2024

lattwood commented Jun 12, 2024 •

edited

Loading

jrasell commented Jun 12, 2024

lattwood commented Jun 12, 2024

lattwood commented Jun 12, 2024 •

edited

Loading

lattwood commented Jun 13, 2024

lattwood commented Jun 13, 2024

jrasell commented Jun 14, 2024

lattwood commented Jun 14, 2024

lattwood commented Jun 14, 2024

lattwood commented Jun 17, 2024

lattwood commented Jun 20, 2024 •

edited

Loading

schmichael commented Aug 14, 2024

lattwood commented Aug 14, 2024

lattwood commented Aug 21, 2024

lattwood commented Sep 5, 2024 via email

schmichael commented Sep 5, 2024

lattwood commented Sep 9, 2024

pending allocations stuck in pending state after adoption by a new deployment #23305

pending allocations stuck in pending state after adoption by a new deployment #23305

Comments

lattwood commented Jun 11, 2024 • edited Loading

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Expected Result

Actual Result

Screenshots of Absurdity

Reporter's speculation

Nomad logs

lattwood commented Jun 11, 2024 • edited Loading

lattwood commented Jun 11, 2024

lattwood commented Jun 11, 2024

lattwood commented Jun 12, 2024 • edited Loading

jrasell commented Jun 12, 2024

lattwood commented Jun 12, 2024

lattwood commented Jun 12, 2024 • edited Loading

lattwood commented Jun 13, 2024

lattwood commented Jun 13, 2024

jrasell commented Jun 14, 2024

lattwood commented Jun 14, 2024

lattwood commented Jun 14, 2024

lattwood commented Jun 17, 2024

lattwood commented Jun 20, 2024 • edited Loading

schmichael commented Aug 14, 2024

lattwood commented Aug 14, 2024

lattwood commented Aug 21, 2024

lattwood commented Sep 5, 2024 via email

schmichael commented Sep 5, 2024

lattwood commented Sep 9, 2024

lattwood commented Jun 11, 2024 •

edited

Loading

lattwood commented Jun 11, 2024 •

edited

Loading

lattwood commented Jun 12, 2024 •

edited

Loading

lattwood commented Jun 12, 2024 •

edited

Loading

lattwood commented Jun 20, 2024 •

edited

Loading